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GLYPHOSATE-TOLERANT 5- normal catalytic activity in plants in the presence of gly- 

£NOO^UVYIJSmKIMAT&3-PHOSFHATE phosate (Kishore et aL, 1988). 

SYNTHASES While such variant EPSP synthases have proved useful in 

obtaining transgenic plants tolerant to glyphosate, it would 

This is a continuation-in-part of a U.S. patent application 5 be increasingly beneficial to obtain an EPSP synthase that is 

Ser. No. 07/749,611, riled Aug. 28, 1991 now abandoned, higfiry gjyphosate-tolerant while soil Mnetically efficient 

which is a continuation-in-part of ILS. patent application such that the amount of the glyphosate-tolerant EPSPS 

Ser. No. 07/576,537, filed Aug. 31, 1990, now abandoned. needed to be produced to maintain normal catalytic activity 

_ in the plant is reduced or mat improved tolerance be 

BACKGROUND OF THE EWENTIDN 10 obtained with the same expression leveL 

This invention relates in general to plant molecular biol- Previous studies have shown that EPSPS enzymes from 

ogy and, more particularly, to a new class of glyphosate- different sources vary widely with respect to their degree of 

tolerant 5-enolpyruvylshfkimate-3-phosphate synthases. sensitivity to inhibition by glyphosate. A study of plant and 

Recent advances in genetic engineering have provided the 15 bacterial EPSPS enzyme activity as a function of glyphosate 

requisite tools to transform plants to contain foreign genes. concentration showed that there was a very wide range in the 

E is now possible to produce plants which have unique degree of sensitivity to glyphosate. The degree of sensitivity 

characteristics of agronomic importance. Certainly, one such showed no correlation with any genus or species tested 

advantageous trait is more cost effective, environmentally (Schulz et aL, 1985). Insensitivity to glyphosate inhibition of 

compatible weed control via herbicide tolerance. Herbicide- _ the activity of the EPSPS from the Pseudomonas sp. PG2982 

tolerant plants may reduce the need for tillage to control nas becn reported but with no details of the studies 

weeds thereby effectively reducing soil erosion. (Fitzgibbon, 1988). In general, while such natural tolerance 

One herbicide which is the subject of much investigation been reported, there is no report suggesting Jhe kinetic 

in this regard is N-phosphonomemylglycine commonly * ° CCUI ^ g ba f Sml ^^St 

referred to as glyphosate. ^osate Sits the srnTrirnic 25 enzymes over those of minted EPSPS 

acid pathway which leads to the biosynthesis of aromatic 1"*^* n * have ^i*** i enes **** dm ? ctenz ^ t 

compounds including amino adds, plant hormones and Sjrnflarly there are norqports on the exr^ssion of imturaUy 

vitamin! Specifically, glyphosate curbs the conversion of ^osate-to erant EPSPS enzymes in plants to confer 

phosphoenolpyruvic acid (PEP) and 3^hosphosrnTriirric acid glyphosate tolerance. 

to 5-endpyruvyl-3-phosphoshikmiic acid by inhibiting the 30 ^ 7 ° r Purposes of the present invention the term * "mature 

enzyme 5-enolpyruvylshikirnate-3-phosphate synthase EPSP synthase** relates to the EPSPS polypeptide without 

(hereinafter referred to as EPSP synthase or EPSPS). For ^e N-terminal chloroplast transit peptide. It is now known 

purposes of the present invention, the term "glyphosate*' that the precursor form of the EPSP synthase in plants (with 

should be considered to include any herbicidally effective ^ e peptide) is expressed and upon delivery to the 

form of N-phosphonomemylglycine (including any salt 35 chloroplast, the transit peptide is cleaved yielding the mature 

thereof) and other forms which result in the production of EPSP synthase. All numbering of amino acid positions are 

the glyphosate anion in planta. given with respect to the mature EPSP synthase (without 

It has been shown mat glyphosate-tolerant plants can be chloroplast transit peptide leader) to facilitate comparison of 

produced by inserting into the genome of the plant the &SFS ^m sources which have chloroplast 

capacity to produce a higher level of EPSP synthase in the 40 tl3nsii PeP* 0 ^ ( Le -> P 3 ^ ^ fun S0 to sources which do 

chloroplast of the cell (Shah et aL, 1986) which enzyme is not vtiShxt a chlaro P last targeting signal (ie., bacteria), 
preferably glyphosate-tolerant (Kishore et aL 1988). Vari- 10 me amino acid sequences which follow, the standard 

ants of the wild-type EPSPS enzyme have been isolated single letter or three letter nomenclature are used. All 

which are glyphosate-tolerant as a result of alterations in the peptide structures represented in the following description 

EPSPS amino acid coding sequence (Kishore and Shah, 45 are shown in conventional format in which the amino group 
1988; Schulz et aL, 1984; Sost et al., 1984; Kishore et aL, ^ N-terrninns appears to the left and the carboxyl group 
1986). These variants typically have a higher K, for glypho- me C-terrninus at the right Likewise, amino acid nomen- 

sate than the wild-type EPSPS enzyme which confers the clature for the naturally occurring amino acids found in 

glyphosate-tolerant phenotype, but these variants are also protein is as follows: alanine (Ala^V), asparagjne (Asn;N), 

characterized by a high K,,, for PEP which makes the 50 aspartic acid (Asp-JD), argjnine (ArgR), cysteine (Cys;C), 

enzyme kmetically less efficient (Kishore and Shah, 1988; glutamic acid (Glu;E), gtatarnine (Gln;Q), glycine (dy;G), 

Sost et aL, 1984; Schulz et aL, 1984; Kishore et aL, 1986; histidine (His;H), isoleudne (DeJ), leucine (Leu^L), lysine 

Sost and Amrhein, 1990). For example, the apparent 1^ for (Lys;K). methionine (Met^f), phenylalanine (Phe;F), pro- 

PEP and the apparent K,- for glyphosate for the native EPSPS Hue (Pro;P). serine (Ser,S), threonine (Trn^T), tryptophan 

from£. coU are 10 uM and 0.5 uM while for a glyphosate- 55 C^P;W), tyrosine (Tyr,Y), and valine (Val;V). An "X** is 

tolerant isolate having a single amino acid substitution of an used when the amino acid residue is unknown and paren- 

alanine for the glycine at position 96 these values are 220 theses designate that an unambiguous assignment is not 

uM and 4.0 mM, respectively. A number of glyphosate- possible and the amino acid designation within the paren- 

tolerant plant variant EPSPS genes have been constructed by theses is the most probable estimate based on known infor- 

mutagenesis. Again, the glyphosate-tolerant EPSPS was 60 mation. 

impaired due to an increase in the for PEP and a slight The term "nonpolar" amino acids include alanine, valine, 
reduction of the ^ max of the native plant enzyme (Kishore leucine, isoleucine, proline, phenylalanine, tryptophan, and 
and Shah, 1988) thereby lowering the catalytic efficiency methionine. The term "uncharged polar" amino acids 
(V ma JK Tn ) of the enzyme. Since the kinetic constants of the include glycine, serine, threonine, cysteine, tyrosine, a spar- 
variant enzymes are impaired with respect to PEP, it has 65 agine and glutamine. The term "charged polar" amino acids 
been proposed that high levels of overproduction of the includes the "acidic" and "basic** amino acids. The term 
variant enzyme, 40-80 fold, would be required to maintain "acidic** amino acids includes aspartic acid and glutamic . 
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add. The term "basic" amino acid includes lysine, arginine amino acid homology which are conserved in Class n EPSP 

and histidine. The term "polar** amino acids includes both synthases as discussed hereinafter, 

"charged polar** and '"uncharged polar" amino acids. Other Class II EPSPS enzymes can be readily isolated and 

Deoxyribonucleic acid (DNA) is a polymer comprising identified by utilizing a nucleic acid probe from one of the 

four mononucleotide units, dAMP (2'-Deoxyadenosine-5- 5 Class II EPSPS genes disclosed herein using standard 

monophosphate), dGMP (2'-Deoxyguanosine-5- hybridization techniques. Such a probe from the CP4 strain 

monophosphate), dCMP (2'-Deoxycytosine-5- has been prepared and utilized to isolate the Class H EPSPS 

monophosphate) and dTMP (2 , -Deoxythymosine-5- genes from strains LBAA and PG2982. These genes may 

monophosphate) linked in various sequences by 3\5- also optionally be adapted for enhanced expression in plants 

phosphodiester bridges. The structural DNA consists of 10 by known methodology. Such a probe has also been used to 

multiple nucleotide triplets called "cottons" which code for identify homologous genes in bacteria isolated de novo from 

the amino acids. The codons correspond to the various sou 1 . 

amino acids as follows : Arg (CGA, CGC CGG, CGT, AGA, The Class JL EPSPS enzymes are preferably fused to a 

AGG); Leu (CIA, CTC, CTG. CTT, TTA, TTG); Ser (TCA, chloroplast transit peptide (CIP) to target the protein to the 

TCC, TCG, TCT, AGC, AGT); Thr (ACA, ACC, ACG, M chloroplasts of the plant into which it may be introduced. 

ACT); Pro (CCA, CCC, CCG, CCT); Ala (GCA, GCC, Chimeric genes encoding this CTP-Qass II EPSPS fusion 

GCG, GCT); CHy (GGA, GGC, GGG, GCT); He (ATA, protein may be prepared with an appropriate promoter and 

ATC, AIT); Val (GTA, GTC, GTG, GTT); Lys (AAA, 3' polyadenylation site for introduction into a desired plant 

AAG); Asn (AAC, AAT); Gin (CAA, CAG); His (CAC, by standard methods. 

CAT); Gin (GAA, GAG); Asp (GAC, GAT); Tyr (TAC, 20 To obtain me maximal tolerance to glyphosate herbicide 
TAT) ; Cys (TGC TGT); Phe CTTC, TIT); Met (ATG); and his preferable to transform the desired plant with a plant- 
Tip (UGG). Moreover, due to the redundancy of the genetic expressible Class H EPSPS gene in conjunction with another 
code (i.e., more than one codon for all but two amino acids), plant-expressible gene which expresses a protein capable of 
there are many possible DNA sequences which may code for degrading glyphosate such as a plant^expressible gene 
a particular amino acid sequence. 25 encoding a glyphosate oxidoreductase enzyme as described 
SUMMARY OF THE INVENTION in PCT Application No. WO 92/00377, the disclosure of 
DNA molecules cornprising DNA encoding ldnctically wl^^hereby mcorporated by reference, 
efficient, ^yphosate-tolerant EPSP synthases are disclosed. Therefore, in one aspect, the present invention provides a 
The EPSP synthases of the present invention reduce the 30 new dass of EPSF synthases that exhibit a low K,,, for 
amount of overproduction of the EPSPS enzyme in a trans- phosphoenolpyruvate (PEP), a high V tn JK m ratio, and a 
genie plant necessary for the enzyme to maintain catalytic ^ K, for glyphosate such that when introduced into a 
activity while still conferring glyphosate tolerance. The plant * e P 1 * 11 * is made glyphosate-tolerant such that the 
EPSP synthases described herein represent a new dass of cat ^ c me 211(1 metabolism are 
EPSPS enzymes, referred to hereinafter as Class II EPSPS 35 m* 111 ^^ m a substantially normal state. For purposes of 
enzymes, dass H EPSPS enzymes of the present invention discussio11 ' a efficient EPSPS refers to its effi- 
usually share only between about 47% and 55% amino acid denc y m ^ I^esence of glyphosate. 
similarity or between about 22% and 30% amino acid More particularly, the present invention provides EPSPS 
identity to other known bacterial or plant EPSPS enzymes enzymes having a K„, for phosphoenolpyru vate (PEP) 
and exhibit tolerance to glyphosate while maintaining suit- 40 between 1-150 uM and a K^yphosate)^ (PEP) ratio 
able K^, (PEE*) ranges. Suitable ranges of K„ (PEP) for between 3-500, said enzymes having the sequence domains: 
EPSPS for enzymes of the present invention are between -R-Xi-H-X^E^SEQ ID N037), in which 

1- 150 uM, with a more preferred range of between 1-35 Xj is an uncharged polar or acidic amino acid, 
uM, and a most preferred range between 2-25 pM. These Xj is serine or threonine; and 

kinetic constants are determined under the assay conditions 45 -G-D-K-X 3 -(SEQ ID NO: 3 8), in which 

specified hereinafter. An EPSPS of the present invention X3 is serine or threonine; and 

preferably has a K, for glyphosate range of between -S-A-QOQ-KKSEQ ID N039), in which 

15-10000 uM. The K/K^ ratio should be between about is any amino acid; and 

2- 500, and more preferably between 25-500. The of -N-Xj-T-R^SEQ E>:40), in which 
the purified enzyme should preferably be in the range of 50 X, is any amino acid. 

2-100 units/mg (|Junoles/rrunutejng at 25° C) and the Exemplary Class TL EPSPS enzyme sequences are dis- 

for shikimate-3-phosphate should preferably be in the range closed from seven sources: Agrobacterium sp. strain desig- 

of 0.1 to 50 uM. nated CP4, Achromobacter sp. strain LBAA, Pseudomonas 

Genes coding for Class H EPSPS enzymes have been sp. strain PG2982, Bacillus subtilis 1A2, Staphylococcus 

isolated from five (5) different b&cteimAgrobacterium 55 aureus (ATCC 35556), Synechocystis sp. PCC6803 and 

tumefaciens sp. strain CP4, Achromobacter sp. strain LBAA, Dichclobactcr nodosus. 

Pseudomonas sp. strain PG2982, Bacillus subtilis, and Sta- In another aspect of the present invention, a double- 

phylococcus aureus. The LBAA and PG2982 Class II stranded DNA molecule comprising DNA encoding a Class 

EPSPS genes have been determined to be identical and the II EPSPS enzyme is disclosed. Exemplary Class II EPSPS 

proteins encoded by these two genes are very similar to the 60 enzyme DNA sequences are disclosed from seven sources: 

CP4 protein and share approximately 84% amino acid Agrobacterium sp. strain designated CP4, Achromobacter 

identity with it Class n EPSPS enzymes often may be sp. strain LBAA, Pseudomonas sp. strain PG2982, Bacillus 

distinguished from Class I EPSPS' s by their inability to subtilis 1A2, Staphylococcus aureus (ATCC 35556), Syn- 

react with polyclonal antibodies prepared from Class I echocystis sp. PCC6803 and Dichelobacter nodosus. 

EPSPS enzymes under conditions where other Class I 65 In a further aspect of the present invention, nucleic acid 

EPSPS enzymes would readily react with the Class I anti- probes from EPSPS Class II genes are presented that are 

bodies as well as the presence of certain unique regions of suitable for use in screening for Class II EPSPS genes in 
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other sources by assaying for the ability of a DNA sequence FIG. 8A and 8B show the structural DNA sequence (SEQ 

from the other source to hybridize to the probe. ID NO:9) for the synthetic CP4 Class n EPSPS gene. 

. ' ntor „. : . . , t . „„ 5 sequence (SEQ ID NO:ll) derived from the Ambidopsis 

a) a promoter which functions in plant cells to cause the a,„i;„-„ tjdcdc m> o„lt 

production ofanRNA sequence; ^1 k, ^ and containing aSphl restriction site 

b) a structural DNA sequence that causes the production Jjj CUOTOplaSt P"^ 35 " 18 ^ hcrcmafter refiaied t0 38 
of an RNA sequence which encodes a Class II EPSPS 

enzyme having the sequence domains: 10 nG - 10A ^ 10B show me DNA sequence (SEQ ID 

-R-X^H-X^-E-(SEQ ID N037), in which NO:12) of the chloroplast transit peptide and encoded amino 

X x is an uncharged polar or acidic amino acid, acid sequence (SEQ ID NO: 13) derived from the Arabidop- 

X2 is serine or threonine; and sis thaliana EPSPS gene and containing an EcoRI restriction 

-G-D-K-X^SEQ ID N038), in which site within the mature region of the EPSPS, hereinafter 

X 3 is serine or threonine; and 15 referred to as CTP3. 

-S-A-Q-X4-K-(SEQ ID NO:39), in which 1 FIG. 11 shows the DNA sequence (SEQ ID NO:14) of the 

X* is any amino acid; and chloroplast transit peptide and encoded amino acid sequence 

-N-Xs-T-R-<SEQ ID:40), in which (SEQ ID NO: 15) derived from the Petunia hybrida EPSPS 

Xs is any amino add; and CTP and containing a SphI restriction site at the chloroplast 

c) a 3' nontranslated region which functions in plant cells w processing site and in which the amino adds at the process- 
to cause the addition of a stretch of polyadenyl nude- ing site are changed to -Cys-Met-, hereinafter referred to as 
otides to the 3 1 end of the RNA sequence CTP4. 

where the promoter is heterologous with respect to the FIG. 12A and 12B show the DNA sequence (SEQ ID 

structural DNA sequence and adapted to cause sufficient NO: 16) of the chloroplast transit peptide and encoded amino 

expression of the EPSP synthase polypeptide to enhance the ^ add sequence (SEQ ID NO:17) derived from the Petunia 

glyphosate tolerance of a plant cell transformed with said hybrida EPSPS gene with the naturaUy occurring EcoRI site 

DNA molecule. in the mature region of the EPSPS gene, hereinafter referred 

In still yet another aspect of the present invention, trans- to as CIP5. 

genie plants and traiisfonned plant cells are disclosed that mG 13 shows a plasmid map of CP4 plant 

are made glyphosate-tolerant by the mtr*duc*on of the M transformation/expression vector pMON17110. 

above-described plant-expressible Class II EPSPS DNA ™n, . . . ^ M . . 

molecule into me plant^ome. ^ 14 shaw J a P ksmidma P of <** 

In still another aspect of the present invention, a method gene plant ^onnatron/expression vector pMON1713 L 

for selectively controlling weeds in a crop field is presented mG ' 15 shows a plasmid map of CP4 EPSPS free DNA 

by planting crop seeds or crop plants traiisfonned with a P 1 ** 1 transformation expression vector pMON13640. 

plant-expressible Class E EPSPS DNA molecule to confer FIG. 16 shows a plasmid map of CP4 plant 

glyphosate tolerance to the plants which allows for glypho- transformation/direct selection vector pMON17227. 

sate containing herbicides to be applied to the crop to FIG. 17 shows a plasmid map of CP4 plant 

selectively Mil the glyphosate sensitive weeds, but not the transformation/expression vector pMON19653. 

crops. ^ FIG. 18A, 18B, 18C and 18D show the structural DNA 

Other and further objects, advantages and aspects of the sequence (SEQ ID NO:41) for the Class II EPSPS gene from 

invention will become apparent from the accompanying the bacterial isolate Bacillus subUUs and the deduced amino 

drawing figures and the description of the invention. acid sequence (SEQ ID NO:42). 

BRIEF DESCRIPTION OF THE DRAWINGS HG. 19A, 19B, 19C and 19D show the structural DNA 

FIGS- 1A, IB, show the DNA sequence (SEQ ID NO:l) 45 t ^^?^ 
for the full-length promoter of figwort mosaic virus 

/pj^Mcc) ammo aad sequence (SEQ ID NO:44). 

PTO. 2 snows the cosmid cloning vector pMON17020. "J*?- ™> ^? f 0D - 20F - ™> ^ ™* 

FIG. 3A. 3B, 3C 3D and 3E show the structural DNA % c ° m P anson of * e representative 

«^^mwM>«. «T ri nnZT ^~ 50 Class n EPSPS armno acid sequences Pseudomonas sp. 

S3?22??2!£2i£^ SFSF'fT rc2982 ^ m NO^Achromobacter sp. strain 

bactenal isolate J^bactmum stoun CP4 and the lbaa^eq JD N^), Agrobactaium sp. strain designated 

FIG. 4A 4B. 4C, 4D and 4B show the structural DNA staphylococcus aureus (SEQ ID NO:44)wim that for rep- 

^uence (SEQ ID Ntfc4) for the Class H EPSPS gene from „ JSL^ aass j ^ ps acid 

fte bacterial isolate Acbromobacte^ stamlBAAandthe [Sacchromyces cerevisiae (SEQ ID NO:49), Aspergillus 

deduced amino acid sequence (SEQ ID NO:5). niduUms (SEQ m NO:50)< Bnusica (seq ro 

PTO. 5A. 5B, SC, 5D and SE show the structural DNA N0: 51), Ambidopsis thaliana (SEQ ID NO:52), Nicotina 

sequence (SEQ ID NO:6) for file Class H EPSPS gene from tobacum (SEQ ID NO:53), L escuUntum (SEQ ID NO:54), 

the bacterial isolate Pseudomonas sp. strain PG2982 and the & p etU ma hybrida (SEQ ID NO 35). Zea mays (SEQ ID 

deduced amino acid sequence (SEQ ID NO:7). NO:56), Solmenella gallinarum (SEQ ID NO:57), Sobne- 

FIG. 6A and 6B show the Bestfit comparison of the CP4 nella typhimurium (SEQ ID NO:58), Solmenella typhi (SEQ 

EPSPS amino acid sequence (SEQ ID N03) with that for IDN0^5),£. coli (SEQ ID NO:8), K. pneumoniae (SEQ ID 

the E. coU EPSPS (SEQ ID NO:8). NO:59), Y. enterocoMca (SEQ ID NO:60), H. influenzae 

FIG. 7A and 7B show the Bestfit comparison of the CP4 65 (SEQ ID NO:61), P. multocida (SEQ ID NO:62), Aeromo- 

EPSPS amino add sequence (SEQ ID N03) with mat for nas sabnonicida (SEQ ID NO:63), Bacillus pertussis (SEQ 

the LBAA EPSPS (SEQ ID HQS). ID NO:64)] and illustrates the conserved regions among 



5,633,435 

7 8 

Class II EPSPS sequences which are unique to Class II to glyphosate herbicides. The amount of Class II EPSPS 

EPSPS sequences. To aid in a comparison of the EPSPS needed to induce the desired tolerance may vary with the 

sequences, only mature EPSPS sequences were compared plant species. It is preferred mat the promoters utilized have 

That is, the sequence corresponding to the chloroplast transit relatively high expression in all meristematic tissues in 

peptide, if present in a subject EPSPS, was removed prior to 5 addition to other tissues inasmuch as it is now known that 

making the sequence alignment glyphosate is translocated and accumulated in mis type of 

FIG. 21A, 21B, 21C, 21D and 21E show the structural plant tissue. Alternatively, a combination of chimeric genes 

DNA sequence (SEQ ID NO:66) for the Class II EPSPS can be used to cumulatively result in the necessary overall 

gene from the bacterial isolate Synechocystis sp. PCC6803 expression level of me selected Class II EPSPS enzyme to 

and the deduced amino acid sequence (SEQ ID NO:67). io result in the glyphosate-tolerant phenotype. 

FIG. 22A, 22B, 22C, 22D and 22E show the structural The mRNA produced by a DNA construct of the present 

DNA sequence (SEQ ID NO:68) for the Class II EPSPS invention also contains a 5 1 non-translated leader sequence, 

gene from the bacterial isolate Dichelobacter nodvsus and This sequence can be derived from the promoter selected to 

the deduced amino acid sequence (SEQ ID NO:69). express the gene, and can be specifically modified so as to 

FIG. 23A, 23B, 23C and 22D show the Bestfit comparison 15 increase translation of the mRNA. The 5* non-translated 

of the representative Class n EPSPS amino acid sequences regions can also be obtained from viral RNAs, from suitable 

Pseudomonas sp. strain PG2982 (SEQ ID NO:7), Achromo eukaryotic genes, or from a synthetic gene sequence. The 

barter sp. strain LBAA (SEQ ID NO:5), Agrobacterium sp. present invention is not limited to constructs, as presented in 

strain designated CP4 (SEQ ID NO:3), Synechocystis sp. the following examples, wherein the non-translated region is 

PCC6803 (SEQ ID NO:67), Bacillus subtills (SEQ ID 20 derived from both the 5' non-translated sequence that 

NO:42), Dichelobacter nodosus (SEQ ID NO:69) and Sta- accompanies the promoter sequence and part of the 5' 

phylococcus aureus (SEQ ID NO:44). non-translated region of the virus coat protein gene. Rather, 

FIG. 24 a plasmid map of canola plant transformation/ me non-translated leader sequence can be derived from an 

expression vector pMON17209. ^ unrelated promoter or coding sequence as discussed above, 

FIG. 25 a plasmid map of canola plant transformation/ Preferred promoters for use in the present invention the 

expression vector pMON17237. full-length transcript (SEQ ID NO:l) promoter from the 

OF TUB INVENTION Jff^SSETSE^EffS 

The expression of a plant gene which exists in double- 30 (CaMV), including the enhanced CaMV35S promoter (Kay 

stranded DNA form involves synthesis of messenger RNA et aL 1987). The FMV35S promoter functions as strong and 

(mRNA) from one strand of the DNA by RNA polymerase uniform promoter with particularly good expression in mer- 

enzyme, and the subsequent processing of the mRNA pri- istematic tissue for chimeric genes inserted into plants, 

mary transcript inside the nucleus. This processing involves particularly dicotyledons. The resulting transgenic plant in 

a 3' non-translated region which adds poly adenylate nude- 35 general expresses the protein encoded by the inserted gene 

oddes to the 3* end of the RNA. at a higher and more uniform level throughout the tissues 

Transcription of DNA into mRNA is regulated by a region and cells of the transformed plant than the same gene driven 

of DNA usually referred to as the "promoter." The promoter by an enhanced CaMY35S promoter. Referring to FIG. 1, 

region contains a sequence of bases mat signals RNA the DNA sequence (SEQ ID NO:l) of the FMV35S promoter 

polymerase to associate with the DNA, and to initiate the 40 is located between nucleotides 6368 and 6930 of the FMV 

transcription into mRNA using one of the DNA strands as a genome. A 5' non-translated leader sequence is preferably 

teinpifltft to make a corresponding complementary strand of coupled with the promoter. Hie leader sequence can be from 

RNA. A number of promoters which are active in plant cells the FMV35S genome itself or can be from a source other 

have been described in the literature. These include the than FMV35S. 

nopaline synthase (NOS) and octopine synthase (OCS) 45 For expression of heterologous genes in moncotyledon- 

promoters (which are carried on turner-inducing plasmids of ous the use of an intron has been found to enhance 

Agrobacterium tumeraciens), the cauliflower mosaic virus expression of the heterologous gene. While one may use any 

(CaMV) 19S and 35S promoters, the lig^-inducible pro- of a num ber of introns which have been isolated from plant 

moter from the small subunit of ribulose bis-phosphate genes, the use of the first intron from the maize heat shock 

carboxylase (ssRUBISCO, a very abundant plant 50 70 gene is preferred, 
polypeptide) and the full-length transcript promoter from the 

figwort mosaic virus (FMV35S), promoters from the maize ^ 3 ' non-translated region of the chimeric plant gene 

ubiquitin and rice actin genes. All of these promoters have contains a polyadenylation signal which functions in plants 

been used to create various types of DNA constructs which t0 C2Use me addition of polyadenylate nucleotides to the 3' 

have been expressed in plants; see, e,g., PCT publication 55 end of the viral RNA. Examples of suitable 3' regions are (1) 

WO 84/02913 (Rogers et aL, Monsanto). the 3* transcribed, non-translated regions containing the 

Promoters which are known or found to cause transcrip- ^vatoylated signal of Agrobacterium tumor-mdudng 

tibn of DNA in plant cells can be used in the present <™ genes, such as ithe nopaline synthase (NOS) 

invention. Such promoters may be obtained from a variety S ene ' **P g^cs litejhe soybean storage protem 

of sources such as plants and plant DNAviruses and include, 60 genes and me smaUsu^ 

but are not limited to, the CaMV35S and FMV35S promot- carboxylase (ssRUBISCO) gene. An example of a preferred 

ers and promoters isolated from plant genes such as 3' region is that from the ssRUBISCO gene from pea (E9), 

ssRUBISCO genes and the maize um'quitin and rice actin described in greater detail below, 

genes. As described below, it is preferred that the particular The DNA constructs of the present invention also contain 

promoter selected should be capable of causing sufficient 65 a structural coding sequence in double-stranded DNA form 

expression to result in the production of an effective amount which encodes a glyphosate-tolerant, highly efficient Gass 

of a Class II EPSPS to render the plant substantially tolerant H EPSPS enzyme. 
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Identification of glyphosate-tolerant, highly efficient EPSPS 
enzymes 

In an attempt to identify and isolate glyphosate-tolerant, 
highly efficient EPSPS enzymes, kinetic analysis of the 
EPSPS enzymes from a number of bacteria exhibiting 
tolerance to gfyphosate or that had been isolated from 
suitable sources was undertaken. It was discovered mat in 
some cases the EPSPS enzymes showed no tolerance to 
inhibition by glyphosate and it was concluded mat the 
tolerance phenotype of the bacterium was due to an imper- 
meability to glyphosate or other factors. In a number of 
cases, however, microorganisms were identified whose 
EPSPS enzyme showed a greater degree of tolerance to 
inhibition by glyphosate and that displayed a low E^, for 
PEP when compared to that previously reported for other 
microbial and plant sources. The EPSPS enzymes from these 
microorganisms were then subjected to further study and 
analysis. 

Table I displays the data obtained for the EPSPS enzymes 
identified and isolated as a result of the above described 
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ing in I liter (with autoclaved HjO), 1 ml each of A, B and 
C and 10 ml of D (as per below) and thiamine HQ (5 mg). 



A. D-F Salts (100QX stock; per 100 ml; autoclaved): 
H»B03 1 mg 
McS0 4 .7 Kfi 1 mg 
ZnS0 4 .7 IljO 115 mg 
CuS0 4 .5 Kfi 8 mg 
Nata*V3 HjO 1.7 mg 

B. FcS0 4 .7 H^O (1000X Stock; per 100 ml; autoclaved) 0J g 

C. MgS0 4 .7 HjO (100QX Stock; per 100 ml; autoclaved) 20 g 

D. (NBU>2S0 4 (100X stock; per 100 ml; autoclaved) 20 g 



Yeast Extract (YE; Difco) was added to a final concen- 
tration of 0.01 or 0.001%. The strain CP4 was also grown on 
15 media composed of D-F salts, amended as described above, 
containing glucose, gluconate and citrate (each at 0.1%) as 
carbon sources and with inorganic phosphate (0.2-1.0 mM) 
as the phosphorous source. 



Other Class II EPSPS containing microorganisms were 
analysis. Table I includes data for three identified Glass II 20 identified as Achromobacter sp. strain LBAA (Hallas et aL, 



EPSPS enzymes mat were observed to have a high tolerance 
to inhibition to glyphosate and a low K m for PEP as well as 
data for the natrve Petunia EPSPS and a glyphosate-tolerant 
variant of the Petunia EPSPS referred to as GA101. Hie 
GA101 variant is so named because it exhibits the substi- 
tution of an alanine residue for a glycine residue at position 
101 (with respect to Petunia). When the change introduced 
into the Petunia EPSPS (GA101) was introduced into a 
number of other EPSPS enzymes, similar changes in kinet- 
ics were obs erved, an elevation of the K, for glyphosate and 
of the K^, for PEP. 

TABLE I 





haracterizatk 


m of EPSPS enzymes 




ENZYME 


K^PEP 


Kj Glyphosate 




SOURCE 


(MM) 


(MM) 




PetnniH 


5 


0.4 


OJ08 


Petunia GA101 


200 


2000 


10 


PG2982 


2.1-3.1 1 


25-82 


-8-40 


LBAA 




60 (est) 7 


-79 


CM 


12 s 


2720 


227 


B. subtilis 1A2 


13 4 


440 


33.8 


& aureus 


5* 


200 


40 



'Range of PEP tested = 1-40 pM 
2 Range of PEP tested = 5-80 pM 
'Range of PEP tested = 1^-40 MM 
4 Range of PEP tested = 1-60 uM 
5 Range of PEP tested = 1-50 pM 
7 (est) = 



30 



35 



40 



45 



1988), Pseudomonas sp. strain PG2982 (Moore et aL, 1983; 
Fitzgfobon 1988), Bacillus subtilis 1A2 (Henner et aL, 1984) 
and Staphylococcus aureus (O'ConneH et aL, 1993). It had 
been reported previously, from measurements in crude 
ly sates, that the EPSPS enzyme from strain PG2982 was less 
sensitive to inhibition to glyphosate than that of £ coli, but 
there has been no report of the details of this lack of 
sensitivity and there has been no report on the for PEP 
for mis enzyme or of the DNA sequence for the gene for this 
enzyme (Fitzgibbon, 1988; Htzgibbon and Braymer, 1990). 
Relationship of the Class H EPSPS to those previously 
studied 

All EPSPS proteins studied to date have shown a remark- 
able degree of homology. For example, bacterial and plant 
EPSPS's are about 54% identical and with similarity as high 
as 80%. Within bacterial EPSPS's and plant EPSPS's them- 
selves the degree of identity and similarity is much greater 
(see Table II). 

TABLE H 



Comparison between exemplary ( 


3assIEI 


*SPS 


protein sequences 1 






gifnilanty iflwifily 


E. coli vs. £ typhimurium 


93 


88 


P. hybrids vs. B. coli 


72 


55 


P. hybrids vs. L esadentum 


93 


88 



The Agrobacterium sp. strain GP4 was initially identified 
by its ability to grow on glyphosate as a carbon source (10 
mM) in the presence of 1 mM phosphate. The strain CP4 
was identified from a collection obtained from a fixed-bed 
immobilized cell column that employed Mannville R-635 
diatomaceous earth beads. The column had been run for 
three months on a waste-water feed from a glyphosate 
production plant The column contained SO mg/ml glypho- 
sate and NH 3 as NE^CL Total organic carbon was 300 
mg/ml and BOD' s (Biological Oxygen Demand — a measure 
of "soft" carbon availability) were less than 30 mg/mL This 
treatment column has been described (Heitkamp et aL, 
1990). Dworkin-Foster minimal salts medium containing 
glyphosate at 10 mM and with phosphate at 1 mM was used 
to select for microbes from a wash of this column that were 
capable of growing on glyphosate as sole carbon source. 
Dworkin-Foster minimal medium was made up by combin- 



1 Thc EPSPS sequences compared hare were obtained from the following 
SO references: K coli, Rogers et aL, 1983; & typhimurium, Stalker et aL, 1985; 
Petunia hybrids, Shah et aL, 1986; and tomato (L esculattum\ Gasseret aL, 
1988. 

When crude extracts of CP4 and LBAA bacteria (50 ug 
protein) were probed using rabbit anti-EPSPS antibody 

55 (Padgetts et aL, 1987) to the Petunia EPSPS protein in a 
Western analysis, no positive signal could be detected, even 
with extended exposure times (Protein A — 125 I development 
system) and under conditions where the control EPSPS 
(Petunia EPSPS, 20 ng; a Class I EPSPS) was readily 

60 detected The presence of EPSPS activity in these extracts 
was confirmed by enzyme assay. This surprising result, 
indicating a lack of similarity between the EPSPS's from 
these bacterial isolates and those previously studied, coupled 
with the combination of a low K„ for PEP and a high K, for 

65 glyphosate, illustrates that these new EPSPS enzymes are 
different from known EPSPS enzymes (now referred to as 
Class I EPSPS). 
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Glyphosate-tolerant Enzymes in Microbial Isolates phenicol resistance gene (Cm r ;cat) from Tn9 (Alton et aL, 

For clarity and brevity of disclosure, the following 1979), me genelO promoter region from phage T7 (Dunn et 

description of the isolation of genes encoding Class n aL, 1983), and the 1.6 kb Bgffl phage lambda cos fragment 

EPSPS enzymes is directed to the isolation of such a gene firompHC79 (Hohn and Collins, 1980). A number of cloning 

from a bacterial isolate. Those skilled in the art will recog- 5 sites are located downstream of the cat gene. Since the 

nize that the same or similar strategy can be utilized to predominant block to the expression of genes from other 

isolate such genes from other microbial isolates, plant or microbial sources in K coli appears to be at the level of 

fungal sources transcription, the use of the T7 promoter and supplying the 

Cloning of me Agrobacterium sp. strain CP4 EPSPS Gene(s) V^T^^S^ fromthe pGPl-2 plasmid (Tabor and 

in Ecoli to Rxcbardson ' 1985), enables the expression of large DNA 

Having established the existence of a suitable EPSPS in «f° lB ° f f ^ ^ ™ A 

ft . 7^ . JTT * . . polymerase transcnpUon termination sequences. The 

Affo^um sp. stoun CP4, two parcel appro^ Session of me spc gene is impa^ 

undertaken to done the gene: cloning based on the expected TjL- . *7 5 nni rrC M „ jL ^„^S; n 

of the enzyme to provide material to raise antibodies and to is containing pGPl-2. Tte use of antibiotic resistances such as 

obtain aSnTacidTequenoSfrom ^STSS> ?^2£5S ^TttZZ^ZSS* 

vuuuu au>> u«ui mc jnuraji wiauuiaK fa preferred due to the observation that high level expression 

Maniatis et aL, 1982 or Samh^* al 1987. ^ cloning ^^^i*^?^^^ 

stategy was as follows: Reduction of a cosmid bank of 20 . h ^ £^ the ^ ' b me memhrane localized 

strain Agrobacterium sp. strain CP4 into E. coli and selec- ZjZZLZ n j« «C ^Zr^JrtTrtT .X^oX 

*. for me ^PS gene by selection for growth on inhibi- ^S^e^S^^ 

tory concentrations of glyphosate. V ~TI ™ "\ ~~ ' /* 

i. t inxta . £ A •« The vector was then cut withHindTH and treated with calf 

Chromosomal DNA was prepared from strain Agrobac- .... . . , ^ - . . Tr 

* s < s mA am ^rn. n ™_*^__T . alkaline phosphatase (CAP) m preparation for cloning. Vec- 

terium so. strain CP4 as follows: The cell pellet from a 200 25 . , « . . . 

ml L-fifoth (Milkr, 1972), late log phaseSure of Agro- ««■»■» were ^ ^ camhulu, 8 mC 

bacterium sp. strain CP4 was resuspended in 10 ml of 0 owm S : 

Solution I; 50 mM Glucose, 10 mM EDTA, 25 mM Tris -CL 

pH 8.0 (Birnboim and Dory, 1979). SDS was added to a final Vector DNA (Hmdnl/CAP) 3 

concentration of 1% and the suspension was subjected to 30 size fractionated CP4 niodm fragments 1.5 ng 

three freeze-thaw cycles, each consisting of immersion in ligation buffer 2.2 pi 

dry ice for 15 minutes and in water at 70° C for 10 minutes. T4 DNA Hgasc (New England Biofabs) (400 u/nl) i^d 
The lysate was men extracted four times with equal volumes 

of phenol: chloroform (1:1; phenol saturated with TE; TE=10 and adding H 2 0 to 22.0 uL This mixture was incubated for 

raMTris pH8.0; 1.0 mM EDTA) and the phases separated by 35 18 hours at 16° C 10X ligation buffer is 250 mM Tris-HQ, 

centrifugation (15000 g; 10 minutes). The ethanol- pH 8.0; 100 mM MgC^; 100 mM Dithiothrcitol; 2 mM 

precipitahle material was pelleted from the supernatant by Spermidine. The ligated DNA (5 ul) was packaged into 

brief centrifugation (8000 g; 5 minutes) following addition lambda phage particles (Stratagene; Gigapack Gold) using 

of two volumes of ethanol. The pellet was resu spended in 5 the manufacturer* s procedure. 

mlTOanddialyzedforl6houreat4°C 40 A sample (200 pi) of E. coli HB101 (Boyer and Rolland- 

This preparation yielded a 5 ml DNA solution of 552 pg/mL Dussoix, 1973) containing the T7 polymerase expression 

Partially-restricted DNA was prepared as follows. Three plasmid pGPl-2 (Tabor and Richardson, 1985) and grown 

100 ug aliquot samples of CP4 DNA were treated for 1 hour overnight in L-Broth (with maltose at 02% and kanamy on 

at 37° C. with restriction endonuclease HindTTT at rates of 4, at 50/ug/ml) was infected with 50 ul of the packaged DNA. 

2 and 1 enzyme unit/ug DNA, respectively. The DNA 45 Transfonnants were selected at 30° C on M9 (Miller, 1972) 

samples were pooled, made 0.25 mM with EDTA and agar containing kanamycin (50 pgfaal), chloramphenicol (25 

extracted with an equal volume of phenolrchloroform. Fol- pg/ml), L-proHne (50 pg/ml), L-leucine (50 pg/ml) and Bl 

lowing the addition of sodium acetate and ethanol, the DNA (5 pg/ml), and with glyphosate at 3.0 mM. Aliquot samples 

was precipitated with two volumes of ethanol and pelleted were also plated on the same media lacking glyphosate to 
by centrifugation (12000 g; 10 minutes). The dried DNA SO titer the packaged cosmids. Cosmid transfonnants were 

pellet was resuspended in 500 ul TE and layered on a isolated on this latter medium at a rate of -SxlO 5 per pg CP4 

10-40% Sucrose gradient (in 5% increments of 5.5 ml each) HindTTT DNA after 3 days at 30° C Colonies arose on the 

in 0.5M NaCL 50 mM Tris pH&O, 5 mM EDTA. Following glyphosate agar from day 3 until day 15 with a final rate of 

centrifugation for 20 hours at 26,000 rpm in a SW28 rotor, - 1 per 200 cosmids. DNA was prepared from 14 glyphosate- 

the tubes were punctured and -13 ml fractions collected. 55 tolerant clones and, following verification of this phenotype, 

Samples (20 ul) of each second fraction were run on 0.7% was transformed into E coli GB10Q/pGPl-2 (E coli GB 100 

agarose gel and the size of the DNA determined by com- is an aroA derivative of MM294 (Tahnadge and Gilbert, 

parison with linearized lambda DNA and HmdM-dtigested 1980]) and tested for complementation for growth in the 

lambda DNA standards. Fractions containing DNA of 25-35 absence of added aromatic amino acids and aminobenzoic 

kb fragments were pooled, desalted on AMTCON10 columns 60 acids. Other aroA strains such as SR481 (Bachman et aL, 

(7000 rpm; 20° C; 45 minutes) and concentrated by pre- 1980; Padgette et aL, 1987), could be used and would be 

dpitation. This procedure yielded 15 pg of CP4 DNA of the suitable for mis experiment The use of GB100 is merely 

required size. A cosmid bank was constructed using the exemplary and should not be viewed in a limiting sense. This 

vector pMON17020. This vector, a map of which is pre- aroA strain usually requires that growth media be supplem- 
ented in FIG. 2, is based on the pBR327 replicon and 65 mented with L-phenylalanine, L-tyrosine and L-tryptophan 

contains the spectinomycin/streptomycin (Sp r ;spc) each at 100 pg/ml and with para-hydroxybenzoic acid, 

resistance gene fromTn7 (Fling et aL, 1985), the chloram- 23-dmydroxybenzoic acid and para-aminobenzoic acid 
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each at 5 pg/ml far growth in minimal media. Of the fourteen sulfate concentration). After stirring for 1 hour, the mixture 

cosmids tested only one showed complementation of me was centrifuged (50 minutes, 8000 rpm) and the resulting 

aroA- phenotype. Transformants of this cosmid, supernatant treated with solid ammonium sulfate to 40% 

pMON17076, showed weak but uniform growth on the saturation and stirred for 1 hour. After centrifugation (50 

unsupplemented minimal media after 10 days. 5 minutes, 8000 rpm), the resulting supernatant was treated 

The proteins encoded by the cosmids were determined in with solid ammonium sulfate to 70% saturation, stirred for 

vivo using a T7 expression system (labor and Richardson, 50 minutes, and the insoluble protein was collected by 

1985). Cultures of K coli containing pGPl-2 (Tabor and centrifugation (1 hour, 8000 rpm). This 40-70% ammonium 

Richardson, 1985) and test and control cosmids were grown sulfate traction was then dissolved in extraction buffer to 

at 30° C. in L-broth (2 ml) with chloramphenicol and 10 give a final volume of 0,2 L, and dialyzed twice (Spectrum 

kanamycih (25 and 50 ugM, respectively) to a Klett reading 10,000 MW cutoff dialysis tubing) against 2 L of extraction 

of -50. An aliquot was removed and the cells collected by buffer for a total of 12 hours. 

centrifugation, washed with M9 salts (Miller, 1972) and To the resulting dialyzed 40-70% ammonium sulfate 

resuspended in 1 ml M9 medium containing glucose at fraction (0.29 L) was added solid ammonium sulfate to give 

0.2%, thiamine at 20 ug/ml and containing the 18 amino is a final concentration of 1M. This materfal was loaded (2 

acids at 0.01% (minus cysteine and methionine). Following ml/min) onto a column (5 cmxl5 cm, 295 ml) packed with 

incubation at 30° C for 90 minutes, the cultures were phenyl Sepharose CL-4B (Pharmacia) resin equilibrated 

transferred to a 42° C. water bam and held mere for 15 with extraction buffer containing 1M ammcmnm sulfate, 

minutes. Rifampicin (Sigma) was added to 200 ug/ml and and washed with the same buffer (L5 L,2 ml/min). EPSPS 

the cultures held at 42° C for 10 additional minutes and then 20 was eluted with a linear gradient of extraction buffer going 

transferred to 30° C for 20 minutes. Samples were pulsed from 1M to 0.00M ammonium sulfate (total volume of 1.5 

with 10 uQ of 35 S-methionine for 5 minutes at 30° C The L, 2 inl/rnin). Fractions were collected (20 ml) and assayed 

cells were collected by centrifugation and suspended in for EPSPS activity by the phosphate release assay. The 

60-120 ul cracking buffer (60 mM Iris-HQ 6.8, 1% SDS, fractions with the highest EPSPS activity (fractions 36-50) 

1% 2-mercaptoemanol, 10% glycerol, 0.01% bromophenol 25 were pooled and dialyzed against 3x2 L (18 hours) of 10 

blue). Aliquot samples were electrophoresed on 12.5% SDS- mMTrisCL, 25 mM KCI, 1 mM EDTA, 5 mM DTT, 10% 

PAGE and following soaking for 60 minutes in 10 volumes glycerol, pH 7.8. 

of Acetic AcidVMethanol-water (1030:60), the gel was The dialyzed EPSPS extract (350 ml) was loaded (5 
soaked in EN1IGHTNING™ (DUPONT) following manu- ml/min) onto a column (2.4 cmx30 cm, 136 ml) packed with 
facturer's directions, dried, and exposed at -70° C. to X-Ray 30 Q-Sepharose Fast How (Pharmacia) resin equilibrated with 
film. Proteins of about 45 kd in size, labeled with 35 S- 10 mM TcisQ, 25 mM KCI, 5 mM DTT, 10% glycerol, pH 
methionine, were detected in number of the cosmids, includ- 7.8 (Q Sepharose buffer), and washed with 1 L of the same 
ing pMON17076. buffer. EPSPS was eluted with a linear gradient of Q 
Purification of EPSPS from Agrobacterium sp. strain CP4 Sepharose buffer going from 0.Q25M to 0.40M KCI (total 
All protein purification procedures were carried out at 35 volume of 1.4 L, 5 ml/min). Fractions were collected (15 ml) 
3°-5° C EPSPS enzyme assays were performed using either and assayed for EPSPS activity by the phosphate release 
the phosphate release or radioactive HPLC method, as assay. The fractions with the highest EPSPS activity 
previously described in Padgette et aL, 1987, using 1 mM (fractions 47-60) were pooled and the protein was precipi- 
phosphoenol pyruvate (PEP, Boehringer) and 2 mM tated by adding solid ammonium sulfate to 80% saturation 
shikimate-3-phosphate (S3P) substrate concentrations. For 40 and stirring for 1 hour. The precipitated protein was col- 
radioactive HPLC assays, 14 -CPEP (Amersham) was uti- lected by centrifugation (20 minutes, 12000 rpm in a GSA 
lized. S3P was synthesized as previously described in Sorvall rotor), dissolved in Q Sepharose buffer (total volume 
Wibbenmeyer et ai 1988. N-terminal amino acid sequenc- of 14 ml), and dialyzed against the same buffer (2x1 L, 18 
ing was performed by loading samples onto a Polybrene hours). 

precycled filter in aliquots while drying. Automated Edman 45 The resulting dialyzed partially purified EPSPS extract 

degradation diemistry was used to determine the N-terininal (19 ml) was loaded (1.7 ml/min) onto a Mono Q 10/10 

protein sequence, using an Applied Biosy stems Model 470A column (Pharmacia) equilibrated with Q Sepharose buffer, 

gas phase sequencer (Hunkapiller et aL, 1983) with an and washed with the same buffer (35 ml). EPSPS was eluted 

Applied Biosy stems 120 A FTH analyzer. with a linear gradient of 0.025M to 035M KQ (total volume 

Five 10-liter fermentations were carried out on a sponta- so of 119 ml 1.7 ml/min). Fractions were collected (1.7 ml) 

neous "smooth" isolate of strain CP4 that displayed less and assayed for EPSPS activity by the phosphate release 

dumping when grown in liquid culture. This reduced clump- assay. The fractions with the highest EPSPS activity 

ing and smooth colony morphology may be due to reduced (fractions 30-37) were pooled (6 ml), 

polysaccharide production by this isolate. In the following The Mono Q pool was made 1M in ammonium sulfate by 

section dealing with the purification of the EPSPS enzyme, 55 the addition of solid ammonium sulfate and 2 ml aliquots 

CP4 refers to the "smooth" isolate — CP4-S1. The cells from were chromatographed on a Phenyl Superose 5/5 column 

the three batches showing the highest specific activities were (Pharmacia) equilibrated with 100 mM TrisQ, 5 mM DTT, 

pooled. Cell paste of Agrobacterium sp. CP4 (300 g) was 1M ammonium sulfate, 10% glycerol, pH 7.5 (Phenyl 

washed twice with 0.5 L of 0.9% saline and collected by Superose buffer). Samples were loaded (1 ml/min), washed 

centrifugation (30 minutes, 8000 rpm in a GS3 Sorvall 60 with Phenyl Superose buffer (10 ml), and eluted with a linear 

rotor). The cell pellet was suspended in 0.9 L extraction gradient of Phenyl Superose buffer going from 1M to 0.00M 

buffer (100 mM TrisQ, 1 mM EDTA, 1 mM BAM ammonium sulfate (total volume of 60 ml, 1 ml/min). 

(Benzamidine), 5 mM DTT, 10% glycerol, pH IS) and lysed Fractions were collected (1 ml) and assayed for EPSPS 

by 2 passes through a Manton Gaulin cell The resulting activity by the phosphate release assay. The fractions from 

solution was centrifuged (30 minutes, 8000 rpm) and the 65 each run with the highest EPSPS activity (fractions -36-40) 

supernatant was treated with 0-21 L of 15% protamine were pooled together (10 ml, 2.5 mg protein). For 

sulfate (in 100 mM TrisQ, pH 73, 0.2% w/v final protamine N-terminal amino acid sequence determination, a portion of 
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one fraction (#39 from run 1) was dialyzed against 50 mM In order to verify the CP4 EPSPS cosmid clone, a number 

NaHC0 3 (2x1 L). The resulting pure EPSPS sample (0.9 ml, of oligonucleotide probes were designed on the basis of the 

77 ug protein) was found to exhibit a single N4erminal sequence of two of the tryptic sequences from the CP4 

amino acid sequence of: enzyme (Table IH). The probe identified as MID was very 

5 low degeneracy and was used for initial screening. The 

xh(G)ASSRPAIAI^ (SBQ probes identified as EDV-C and EDV-T were based on the 

n> NO: 18). same amin o acid sequences and differ in one position 

PK^mri tttkjpc ™c Ma (underlined in Table HI below) and were used as confirma- 

The r ema i n i n g Phenyl Superose EPSPS pool was ma- : „ja * ^ 

lyzed again750 mM IrisO, 2mM EOT, 10 mM KO, 10% *^'^%^^ e ,' 0 * ? < 2? d ? f 

* ^ „ , nc,Z,iTw\ a v w/T*r t^?i m these two probes. In the oligonucleotides below, alternate 

tfynri, PH 7.5^(2x1 L). An ab^ -(055 ml 0.61 mg 10 acceptable nucleotides at a r^cukr position are designated 
proton) was loaded (1 ml/min) onto a Mono Q 5/5 column by a *T such as A/C/T. 
(Pharmacia) equilibrated with Q Sepharose buffer, washed 

wim the same buffer (5 ml), and eluted with a linear gradient TABLE m 

of Q Sepharose buffer going from G-0.14M KQ in 10 

minutes, then holding at 0.14M KQ (1 ml/min). Fractions 15 Selected CP4 EPSPS peptide sequences and DNA probes 

were collected ( 1 ml) and as sayed for EPSPS activity by the ¥BmaBt 61 24-25 apsm/tvd\eypilav «EomNOK» 

phosphate release assay and were subjected to SDS-PAGE rZZS^dp^ (SEQIDNO.19) 
(10-15%, Phast System, Pharmacia, with silver staining) to ATOAl^/c/rGACTOACVAiAC/rcc (seq id NO:2i) 

determine protein purity. Fractions exhibiting a single band peptide 53-28 itcllegedvintgk (seq id NO:20) 

Of protein by SDS-PAGE (22-25, 222 Ug) Were pooled and 20 P*obe EDVC; 17-mer; mixed probe; 4S-foki degener- 

dialyzed against 100 mM ammonium bicarbonate, pH 8.1 gaa/ggac/kjbwc/g/^^ (seq id no:22) 

(2x1 L, 9 hours). Probe EDV-T; 17-mer, mixed probe; 48-foki degenerate 

Trypsinolysis and peptide sequencing of Agrobacterium sp QAA/GGAcnxn3A/c/G/XA3A/c/iAATAC (seq id no:23) 

strain CP4 EPSPS . 

To the resulting pure Agrobacterium sp. strain CP4 25 The probes were labeled using gamma- 32 P-ATP and poly- 
EPSPS (111 ug) was added 3 ug of trypsin (Calbiochem), nucleotide Mnase. DNA from fourteen of the cosrmds 
and the trypsinolysis reaction was allowed to proceed for 16 described above was restricted with EcoRI, transferred to 
hours at 37° C The tryptic digest was then chromatographed membrane and probed wim the oligonucleotide probes. The 
(1 ml/min) on a C18 reverse phase HPLC column (Vydac) conditions used were as follows: prehybridization was car- 
as previously described in Padgette et aL, 1988 for E. coli 30 ried out in ox SSC, 10x Denhardt's for 2-18 hour periods at 
EPSPS. For an peptide purifications, 0.1% trmuoroacetic 60° C, and hybridization was for 48-72 hours in ox SSC, 
acid (TEA, Pierce) was designated buffer "RP-A" and 0.1% 10x Denhardt's, 100 ugfml tRNA at 10° C below theT d for 
TFA in acetonurile was buffer "RP-B". The gradient used for the probe. The T d of the probe was approximated by the 
elution of the trypsinized Agrobacterium sp. CP4 EPSPS formula 2° Cx(A+T>+4 0 Cx(G+Q. The filters were then 
was: 0-8 minutes, 0% RP-B; 8-28 minutes, 0-15% RP-B; 35 washed three times with 6x SSC for ten minutes each at 
28-40 minutes, 15-21% RP-B; 40-68 minutes, 21-49% room temperature, dried and autoradiographed. Using the 
RP-B ; 68-72 minutes, 49-75% RP-B; 72-74 minutes, MID probe, an -9.9 Kb fragment in the pMON17076 cosmid 
75-100% RP-B. Fractions were collected (1 ml) and, based gave the only positive signaL This cosmid DNA was then 
on the elution profile at 210 nm, at least 70 distinct peptides probed wim the EDV-C (SEQ ID NO:22) and EDV-T (SEQ 
were produced from the trypsinized EPSPS. Fractions 40-70 40 ID NO:23) probes separately and again this -9.9 kb band 
were evaporated to dryness and redissolved in 150 ul each gave a signal and only with the EDV-T probe, 
of 10% acetonitrile, 0.1% trifiaoroacetic add. The combined data on the glyphosate-tolerant phenotype, 

The fraction 61 peptide was further purified on the C18 the complementation of the & coli aroA- phenotype, the 
column by the gradient: 0-5 minutes, 0% RP-B; 5-10 expression of a -45 Kd protein, and the hybridization to two 
minutes, 0-38% RP-B; 10-30 minutes, 38-45% B. Frac- 45 probes derived from the CP4 EPSPS amino acid sequence 
tions were collected based on the UV signal at 210 nm A strongly suggested mat the pMON17076 cosmid contained 
large peptide peak in fraction 24 eluted at 42% RP-B and the EPSPS gene. 

was dried down, resuspended as described above, and Localization and subcloning of the CP4 EPSPS gene 
lechromatographed on the C18 column with the gradient The CP4 EPSPS gene was further localized as follows: a 
0-5 minutes, 0% RP-B; 5-12 min, 0-38% RP-B; 12-15 nrin, 50 number of additional Southern analyses were carried out on 

38- 39% RP-B; 15-18 minutes, 39% RP-B; 1&-20 minutes, different restriction digests of pMON17076 using the MID 

39- 41% RP-B; 20-24 minutes, 41% RP-B; 24-28 minutes, (SEQ ID N021) and EDV-T (SEQ ID NCH23) probes 
42% RP-B. The peptide in fraction 25, during at 41% RP-B separately. Based on these analyses and on subsequent 
and designated peptide 61-24-25, was subjected to detailed restriction mapping of the pBlueScript (Stratagene) 
N-terminal amino acid sequencing, and the following 55 subclones of the -9.9 kb fragment from pMON17076, a 3.8 
sequence was determined: Kb EcoRI-Sall fragment was identified to which bom probes 

hybridized. This analysis also showed that MID (SEQ ID 
apsmoxdjeypilav (seq id NO:i9) N0:21) md EDV-T (SEQ ID NO:23) probes hybridized to 

Hie CP4 EPSPS fraction 53 tryptic peptide was further different sides of BamHt dal, and SacH sites. This 3.8 kb 
purified by C18 HPLC by the gradient 0% B (5 minutes), 60 fragment was cloned in both orientations in pBlueScrfct to 
0-30% B (5-17 rninutes), 3(M0% B (17-37 minutes). Hie form PMON17081 and pMON17082. The phenotypes 
peptide in fraction 28, during at 34% B and designated ^parted to K coU by these clones were men determined, 
peptide 53-28, was subjected to N^erminal amino acid Glyphosate tolerance was determined following transforma- 
sequencing, and the following sequence was determined: * on * C0&MM294 containmg pGPl-2 (pttueScript 

65 also contains a T7 promoter) on M9 agar media containing 
glyphosate at 3 mM. Both pMON17081 and pMON17082 
rrcu£GEDV]NiGK (seq id NO:20). showed glyphosate-tolerant colonies at three days at 30° C 
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at about half the size of the controls on the same media 
lacking glyphosate. This result suggested that the 3.8 kb 
fragment contained an intact EPSPS gene. The apparent lack 
of orientation-dependence of mis phenotype could be 
explained by the presence of the T7 promoter at one side of 
the cloning sites and the lac promoter at the other. The aroA 
phenotype was determined in transformants of E. coli 
GB 100 on M9 agar media lacking aromatic supplements. In 
this experiment, carried out with and without the Piac 
inducer HTG, pMON17082 showed much greater growth 
than pMON17081, suggesting that the EPSPS gene was 
expressed from the Sail site towards the EcoRI site. 

Nucleotide sequencing was begun from a number of 
restriction site ends, including the BamHI site discussed 
above. Sequences encoding protein sequences that closely 
matched the N-terminus protein sequence and mat for the 
tryptic fragment 53-28 (SEQ ID NO:20) (the basis of the 
EDV-T probe) (SEQ ID NO:23) were localized to the SaD 
side of this BamHI site. These data provided conclusive 
evidence for the cloning of the CP4 EPSPS gene and for the 
direction of transcription of this gene. These data coupled 
with the restriction mapping data also indicated mat the 
complete gene was located on an -23 kb Xhol fragment and 
this fragment was subcloned into pBlueScript The nucle- 
otide sequence of almost 2 kb of this fragment was deter- 
mined by a combination of sequencing from cloned restric- 
tion fragments and by the use of specific primers to extend 
the sequence. The nucleotide sequence of the CP4 EPSPS 
gene and flanking regions is shown in FIG. 3 (SEQ ID 
N02). The sequence corresponding to peptide 61-24-25 
(SEQ ID NO: 19) was also located. The sequence was 
determined using both the SEQUENASE™ kit from IBI 
(International Biotechnologies Inc.) and the T7 sequencing/ 
Deaza Kit from Pharmacia. 

That the cloned gene encoded the EPSPS activity purified 
from the Agrobacterium sp. strain CP4 was verified in the 
following manner By a series of site directed mutageneses, 
BgjtH and Ncol sites were placed at the N-terminus with the 
fMet contained within the Ncol recognition sequence, the 
first internal Ncol site was removed (the second internal 
Ncol site was removed later), and a Sad site was placed 
after the stop codons. At a later stage the internal NotI site 
was also removed by site-directed mutagenesis. The follow* 
ing list includes the primers for the site-directed mutagenesis 
(addition or removal of restriction sites) of the CP4 EPSPS 
gene. Mutagenesis was carried out by the procedures of 
Kunkel et aL (1987), essentially as described in Sambrooket 
aL (1989). 

PRIMER BgNc (adtfitkm of BgUl and Ncol sites to N-tenninus) 
CGIGGATAGAlOAGGAAGACAACX^ATGGCrcACGGTC 
(SEQH>NO:24) 



PRIMER Spb2 (addition of SphI site to N-taminus) 

GGATAGATTAAGGAAGACGCGCATGCrT^ 

(SEQH>NO:25) 

PRIMER S 1 (adtfitkm of Sad site immediately after stop codons) 

GGCTG(XriGAIX3AGCT 

(SEQIDNO:26) 

PRIMER Nl (removal of internal NotI recognition site) 

cGnrcxscTCcnxxnxxxnGGccGccxTC 

(SEQ ID NO: 27) 

PRIMER Ncol (removal of first internal Ncol recognition site) 

CGGGCAAGGCCA1GCAGGCTATGGGCX30C 

(SBQH>NO:28) 
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-continued 

PRIMER Nco2 (removal of second internal Ncol recognition site) 



CGGGCHXXXX3CCTOACrATCGGCCTCarc 
(SEQIDNO:29) 

This CP4 EPSPS gene was then cloned as a Ncol-Bamffl 
N-terminal fragment plus a BamHI-SacI C-terminal frag- 
ment into a PrecA-genelOL expression vector similar to 
those described (Wong et aL, 1988; Olins et aL, 1988) to 
form pMON17 101. The K,,, for PEP and the K, for glypho- 
sate were determined for the EPSPS activity in crude ly sates 
of pMON17101/ GB100 transformants foUowing induction 
with nalidixic acid (Wong et aL, 1988) and found to be the 
same as that determined for the purified and crude enzyme 
preparations from Agrobacterium sp. strain CP4. 
Characterization of the EPSPS gene from Achromobacter 
sp. strain LB AA and from Pseudomonas sp. strain PG2982 
A cosmid bank of partially Hmdm-restricted LB AA DNA 
was constructed in K coli MM294 in the vector pHC79 
(Hohn and Collins, 1980). This bank was probed with a full 
length CP4 EPSPS gene probe by colony hybridization and 
positive clones were identified at a rate of ~1 per 400 
cosmids. The LB AA EPSPS gene was further localized in 
these cosmids by Southern analysis. The gene was located 
on an -2.8 kb Xhol fragment and by a series of sequencing 
steps, bom from restriction fragment ends and by using the 
oligonucleotide primers from the sequencing of the CP4 
EPSPS gene, the nucleotide sequence of the LBAA EPSPS 
gene was completed and is presented in FIG. 4 (SEQ ID 
NO:4). 

The EPSPS gene from PG2982 was also cloned. The 
EPSPS protein was purified, essentially as described for the 
CP4 enzyme, with the following differences: Following the 
Sepharose CL-4B column, the fractions with the highest 
EPSPS activity were pooled and the protein precipitated by 
adding solid ammonium sulfate to 85% saturation and 
stirring for 1 hour. The precipitated protein was collected by 
centrifugation, resuspended in Q Sepharose buffer and fol- 
lowing dialysis against the same buffer was loaded onto the 
column (as for the CP4 enzyme). After purification on the Q 
Sepharose column, -40 mg of protein in 100 mM Tris pH 
7.8, 10% glycerol, 1 mM EDTA, 1 mM EOT, and 1M 
ammonium sulfate, was loaded onto a Phenyl Superose 
(Pharmacia) column. The column was eluted at 1.0 
ml/minutes with a 40 ml gradient from 1.QM to 0.00M 
ammonium sulfate in the above buffer. 

Approximately 1.0 mg of protein from the active tractions 
of the Phenyl Superose 10/10 column was loaded onto a 
Pharmacia Mono P 5/10 Chromatofocusing column with a 
flow rate of 0.75 ml/minutes. The starting buffer was 25 mM 
bis-Tris at pH 6.3, and the column was eluted with 3 9 ml of 
Polybuffer 74, pH 4.0. Approximately 50 ug of the peak 
fraction from the Chromatofocusing column was dialyzed 
into 25 mM ammonium bicarbonate. This sample was then 
used to determine the N-terminal amino acid sequence. 
The N-terminal sequence obtained was: 

XHSASPKPATARRSB (where X=an miA-nrifif-H residue) (SEQ 
IDNO30) 

A number of degenerate oligonucleotide probes were 
60 designed based on this sequence and used to probe a library 
of PG2982 partial-Hindlll DNA in the cosmid pHC79 
(Hohn and Collins, 1980) by colony hybridization under 
nonstringent conditions. Final washing conditions were 15 
minutes with lx SSC, 0.1% SDS at 55° C One probe with 
65 the sequence GCGGTBGCSGGY1TSGG (where B=C, G, 
orP, S=C or G, and Y=C or T) (SEQ ID NCfc31) identified 
a set of cosmid clones. 
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The cosmid set identified in this way was made up of 

cosmids of diverse HindTTT fragments. However, when this (SEQ ID NO:45) 

set was probed with the CP4 EPSPS gene probe, a cosmid GQAACAiAinAAACQAQAlAAGGTQCAG 

containing the PG2982 EPSPS gene was identified (seqidno:46) 

(designated as cosmid 9C1 originally and later as 5 cxsaaticaaacttcaggatc^ 

pMON20107). By a series of restriction mappings and Tfae other to me isdMon of fte R submis ^ 

Soothern analysis this gene was localized to a -2.8 kb Xhol gene? subclomng from ECE13 into pUC118, was performed 

fragment and the nucleotide sequence of this gene was as follows: 

determined. This DNA sequence (SEQ ID NO:6) is shown (i) Cut ECE13 and pUC with Xmal and SphL 

in EKj. 5. There are no nucleotide differences between the (ii) Isolate 1700bp aroE fragment and 2600bp pUC118 

EPSPS gene sequences from LBAA (SEQ ID NO:4) and vector fragment 

PG2982 (SEQ ID NO:6). The kinetic parameters of the two (iii) Ugate fragments and transform into GB100. 

enzymes are within the range of experimental error. The subclone was designated pMON21133 and the PCR- 

A gene from PG2982 that imparts glyphosate tolerance in i5 derived clone was named pMON21132. Clones from both 

E coU has been sequenced (Rtzg&bon, 1988; Htzgibbon approaches were first coiriimed for coinplementatioB i of the 

o«h n wmw ioor* tk« o«J«^«Xr^ a« w^xno otct>c aroA mutation m E. coh GB100. The cultures exhibited 

and Brayruer, 1990). The sequence of the PG2982 EPSPS c of Q m ^ ^ Q J x ^ fa 

Class H gene shows no homology to the previously reported ^ subclone (p M ON21133) and PCR-derived done 

sequence suggesting that the glyphosate-tolerant phenotype (pMON21132) enzymes, respectively. These specific actrvi- 

of the previous work is not related to EPSPS. 20 ties reflect the expected types of expression levels of the two 

Characterization of the EPSPS from Bacillus subtilis vectors. The B. subtilis EPSPS was found to be 88% and 

Bacillus subtilis 1A2 (prototroph) was obtained from the 100% res^ by 1 mM^hosateuntotihese 

Bacillus Genetic Stock Center at Ohio State University. S^TJlSn^ 0 ^ *MOroil33) a^ 

rTlimmp . . . ,7 7; derived (pMON21132) enzymes, respectively. The appK^ 

Standard EPSPS assay reactions contained crude bacterial M (pep) m( x the appK^yphosate) of me subcloned B. subtilis 

extract with, 1 mM phosphoenolpyruvate (PEP), 2 mM EPSPS (pMON21133) were determined as described above. 

shikimate-3-phosphate (S3P), 0.1 mM ammonium The data were analyzed graphicany by the same methods 

molybdate, 5 mM potassium fluoride, and 50 mM HEPES, used for the 1A2 isolate, and the results obtained were 

pH 7.0 at 25° C One unit (U) of EPSPS activity is defined comparable to those reported above for R subtilis 1A2 

as one umol EPSP formed per minute under these condi- 30 culture. 

tions. For kinetic determinations, reactions contained crude Characterization of the EPSPS gene from Staphylococcus 

bacte rial, 2 mM S3P, varying concentrations of PEP, and 50 aureus 

mM HEPES, pH 7.0 at 25° C The EPSPS specific activity The kinetic properties of the S. aureus EPSPS expressed 

was found to be 0.003 U/mg. When the assays were per- m & co n were determined, including the specific activity, 

formed in the presence of 1 mM gryphosate, 100% of the 35 the appK m (PEP), and the appK/gryphosate). The & aureus 

EPSPS activity was retained. The appKJFEP) of the B. EPSPS gene has been previously described (O'Cormell et 

subtilis EPSPS was determined by measuring the reaction 1993) 

velocity at varying concentrations of PEP. The results were The strategy taken for the cloning of the S. aureus EPSPS 
analyzed graphically by the hyperbolic, Iineweaver-Burk was polymerase chain reaction (PCR), utilizing the known 
and Eadie-Hofstee plots, which yielded appKJWP) values 40 nucleotide sequence of the 5. aureus aroA gene encoding 
of 153 uM, 10.8 uM and 123. uM, respectively. These three EPSPS (O'Cormell et aL, 1993). The S. aureus culture 
data treatments are in good agreement and yield an average (ATCC 35556) was fermented in an M2 facility in three 250 
value for appKJPEP) of 13 uM. The appK^glyphosate) was mL shake flasks containing 55 mL ofTYB (tryptone 5 g/L, 
estimated by determining the reaction rates of B. subtilis yeast extract 3 g/L, pH 6.8). The three flasks were inoculated 
1A2 EPSPS in the presence of several concentrations of 45 with IS mL each of a suspension made from freeze dried 
glyphosate, at a PEP concentration of 2 uM. These results ATCC 35556 & aureus cells in 90 mL of PBS (phosphate- 
were compared to the calculated of the EPSPS, and buffered saline) buffer. Flasks were incubated at 30° C. for 
making the assumption that glyphosate is a competitive 5 da ys w hile shaking at 250 rpm. The resulting cells were 
iruiibitor versus PEP for B. subtilis EPSPS, as it is for all i yse d (boiled in TB [tris/EDTA] buffer for 8 minutes) and the 
other characterized EPSPSs, an appK^glyphosate) was » DNA utilized for PGR reactions. The EPSPS gene was 
determined graphically. The appK^giyphosate) was found to amplified using PCR and engineered into an E. coU expres- 
be 0.44 mM. si on vector as follows: 

The EPSPS expressed from the B. subtilis aroE gene (i) two oligonucleotides were synthesized which incorpo- 

described by Henner et al. (1986) was also studied. The rated two restriction enzyme recognition sites (Ncol 

source of the B. subtilis aroE (EPSPS) gene was the E coli 55 and Sad) to the sequences of the oligonucleotides: 
plasmid-bearing strain ECE13 (original code=MM294[p 

trplOO]; Henner, et aL, 1984; obtained from the Bacillus (seq id NO:47) 

Genetic Stock Center at Ohio State University; the culture GGGGKXAiGGiAAAriQAACAAArcArTO 
genotype is (pBR322 trplOO] Ap [in MM294] [pBR322::6 

kb insert with tipFBA-hisH]). Two strategies were taken to 60 <x3GGGAGCiCArixrccciCAii^^ 
express the enzyme in £. coli GB100 (aroA-): 1) the gene 

was isolated by PCR and cloned into an overexpression (ii) The purified, PCR~amplified aroA gene from 5. aureus 

vector, and 2) the gene was subcloned into an overexpres- was digested using Ncol and Sad enzymes, 

sion vector. For the PCR cloning of the B. subtilis aroE from (iii) DNA of pMON 5723, which contains a pRecA 

ECE13, two oligonucleotides were synthesized which incor- 65 bacterial promoter and GenelO leader sequence (Olins 

porated two restriction enzyme recognition sites (Ndeland etal. 1988) was digested Ncol and Sad and the 3.5 kb 

EcoRI) to the sequences of the following oligonucleotides: digestion product was purified. 
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(iv) Hie 5. aureus FCR product and the Ncol / SacI The EPSPS from & aureus was found to be glyphosate- 
pMON 5723 fragment were ligated and transformed tolerant, with an appK/glyphosate) of approximately 0.2 
into E. coU JM101 competent cells. mM. In addition, the appKJPEP) for the enzyme is approxi- 

(v) Two spectmomyrin-resistant K coli JM101 clones mately 5 uM, yielding a appK^yrAosateyappK wt (PEP) of 
from above (SA#2 and SA#3) were purified and trans- s 40. > 
formed into a competent aroA- K coli strain, GB100 Alternative Isolation Protocols for Other Class H EPSPS 

For complementation experiments SAGB#2 and SAGB#3 Structural Genes 

were utilized, which correspond to SA#2 and SA#3, A number of Class H genes have been isolated and 

respectively, transformed into E. coli GB100. In addition, K v^^TL rf L„<, «fti** ^f™3™ ™* 

coU CB100 (negative control) andpMON 9563 (wt petunia M SE^™" f 1^ T 

EPSPS, positive control) were tested for AroA complemen- 10 ff^ j£. to lw ^ of snnH^ between the 

tation. The organisms were grown in ininimal media plus ^ssl and Class H eiizymes and genes the identification of 

and minus aromatic amino acids. Later analyses showed that other & nes was facilitated by the use of mis first 

the SA#2 and SA#3 clones were identical, and they were S ene ^ a P 100 ^ 61 mc cloning of the LB AA EPSPS gene, 

assigned the plasmid identifier pMON21139. me & 4 S ene P 10 ** allowed the rapid identification of 

SAGB#2 in E coli GB 100 (pMON21139) was also grown 15 cosmid clones and the localization of the intact gene to a 

in M9 minimal media and induced with nalidixic acid. A 51X5311 restriction fragment and some of the CP4 sequencing 

negative control, E. coli GB100, was grown under identical primers were also used to sequence the LBAA(and PG2982) 

conditions except the media was supplemented with arc- EPSPS gcne(s). The CP4 gene probe was also used to 

made amino acids. The cells were harvested, washed with confirm the PG2982 gene clone. The high degree of simi- 

0.9% Nad, and frozen at -80° G, for extraction and EPSPS 20 larity of the Class n EPSPS genes may be used to identify 

analysis. and clone additional genes in much the same way that Class 

The frozen pMON21139 & coU GB100 cell pellet from I EPSPS gene probes have been used to done other Class I 

above was extracted and assayed for EPSPS activity as genes. An example of the latter was in the cloning of the A 

previously described. EPSPS assays were performed using 1 thaliana EPSPS gene using the P. hybrida gene as a probe 

mM phosphoenolpyruvate (PEP), 2 mM shiMmate-3- 25 (Klee et aL, 1987). 

phosphate (S3P) 0. 1 mM ammomum molybdate, 5 mM GlyphosaWtolerant EPSPS activity has been reported 

g»uun fluoride, pH 7* 25° G The total assay volume for EPSP synthases from a number of sources. 

°° 10 PL of the undiluted desalted ^ enzymes ^ n £ ^ .^t,^ to any extent in 

^The results indicate that the two clones contain a tunc- ^t^' IL?***}™* 2? 8 U ***** 

tional aroA/EPSPS genTdnce Z were dfeTnTfa 30 P^.or^tibody probes proWd^ 

niinimal media which contained no aromatic amino acids. s f eenm g far me ***** of me EPSP ^ and provide tools for 

As expected, the GB100 culture did not grow on minimal ^ ™ d characterization of the genes for such 

medium without aromatic amino adds (since no functional enzymes. 

EPSPS is present), and the pMON9563 did confer growth in ^ of described were isolated from 

minimal media. These results demonstrated the successful 35 Dacteria wcre isolated from a gryphosate treatment 

cloning of a functional EPSPS gene from 5. aureus. Both faculty (Strains CP4 and LBAA). The third (PG2982) was 

clones tested were identical, and the K coli expression from a bacterium mat had been isolated from a culture 

vector was designated pMON21139. collection strain. This latter isolation confirms that exposure 

The plasmid pMON21139 in £L coli GB100 was grown in to gryphosate is not a prerequisite for the isolation of high 

M9 minimal media and was induced with nalidixic acid to 40 gryphosate-tolerant EPSPS enzymes and that the screening 

induce EPSPS expression driven from the RecA promoter. A of collections of bacteria could yield additional isolates. It is 

desalted extract of the intracellular protein was analyzed for possible to enrich for gryphosate degrading or gryphosate 

EPSPS activity, yielding an EPSPS specific activity of 0.005 resistant microbial populations (Quinn et aL, 1988; Talbot et 

umol/min mg. Under these assay conditions, the 5. aureus aL, 1984) in cases where it was felt mat enrichment for such 

EPSPS activity was completely resistant to inhibition by 1 45 microorganisms would enhance the isolation frequency of 

mM giyphosate. Previous analysis had shown that & coli Class II EPSPS microorganisms. Additional bacteria con- 

GB100 is devoid of EPSPS activity. raining class E EPSPS gene have also been identified A 

The appK TO (PEP) of the SL aureus EPSPS was determined bacterium called C 12, isolated from the same treatment 

by measuring the reaction velocity of the enzyme (in crude column beads as CP4 ( see above) but in a medium in which 

bacterial extracts) at varying concentrations of PEP. The 50 giyphosate was supplied as bom the carbon and phosphorus 

results were analyzed graphically using several standard source, was shown by Southern analysis to hybridize with a 

kinetic plotting methods. Data analysis using the hyperbolic, probe consisting of the CP4 EPSPS coding sequence. This 

Iineweaver-Burke, and Eadie-Hofstee methods yielded result, in conjunction with that for strain LBAA, suggests 

appK m (PMf) constants of 7.5, 4.8, and 4.0 uM, respectively. that this enrichment method facilitates the identification of 

These three data treatments are in good agreement, and yield 55 Class II EPSPS isolates. New bacterial isolates containing 

an average value for appl^fPEP) of 5 uM. Class II EPSPS genes have also been identified from envi- 

Further information of the giyphosate tolerance of & ronments other than giyphosate waste treatment facilities. 

aureus EPSPS was obtained by deterrmmng the reaction An inoculum was prepared by extracting soil (from a 

rates of the enzyme in the presence of several concentrations recently harvested soybean field in Jerseyville, HL) and a 

of giyphosate, at a PEP concentration of 2 uM. These results 60 population of bacteria selected by growth at 28° C. in 

were compared to the calculated maximal velocity of the DworMn-Foster medium containing giyphosate at 10 mM as 

EPSPS, and making the assumption that gryphosate is a a source of carbon (and with cydoheximide at 100 ug/ml to 

competitive inhibitor versus PEP for S. aureus EPSPS, as it prevent the growth of fungi). Upon plating on I^agar media, 

is for all other characterized EPSPSs, an appK^gfyphosate) five colony types were identified. Chromosomal DNA was 

was determined graphically. The appK^glyphosate) for & 65 prepared from 2ml L-broth cultures of these isolates and the 

aureus EPSPS estimated using mis method was found to be presence of a Class II EPSPS gene was probed using a the 

0.20 mM. CP4 EPSPS coding sequence probe by Southern analysis 
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under stringent hybridization and washing conditions. One 

of the soil isolates, S2, was positive by mis screen. pgdkskhrsfmpggl (SEQ n> NO:32) 

Qass II EPSPS enzymes are identifiable by an elevated Ki 

for glyphosate and thus the genes for these wfll impart a lbksnaaigcret. (SBqid no:33) 

glyphosate tolerance phenotype in heterologous hosts. 5 „ , AJ 

Expression of the gene tromrec^mant plasrnids or phage ThMe comparisons show that the overall relatedness of 

may be achieved through the use of a variety of expression Class 1 311(1 ( ^ ass n k EPSPS proteins is low and that 

promoters and include the T7 promoter and polymerase. Hie sequences in putative conserved regions have also diverged 

T7 promoter and polymerase system has been shown to considerably. 

work in a wide range of bacterial (and mammalian) hosts 10 me q>4 EPSPS an alanine residue is present at the 

and offers the advantage of expression of rnany proteins that "glydnelOr position. The replacement of the conserved 

JSf Pre ?^ 0n fagmentS ' Tolerance to (from me "95-10T region) by an alanine results in 

growth on glyphosate may be shown on minimal growth , * _ , ^ . . ^ 

media. In so^cases, otter genes or conditions thTmay ?^^% f "^™*? 

give glyphosate tolerance have been observed, including m °* ss 1 J™- of **™ ™* 
over expression of beta-lactamase, the igrA gene 15 which contains an alanine at this posMon, fee ^ for PEP is 
(Fitzgibbon and Braymer, 1990), or the gene for glyphosate m mc low range, indicatogl^the Class n enzymes differ 
oxidoreductase (PCT Pub. No. WO92/00377). These are m aspects from the EPSPS enzymes heretofore char- 
easily distinguished from Class H EPSPS by the absence of actenzed. 

EPSPS enzyme activity. Within the Qass II isolates, the degree of similarity/ 

The EPSPS protein is expressed from the aroA gene (also 20 identity is as frigfr as that noted for that within Class I (Table 

called aroE in some genera, for example, in Bacillus) and IVA). FIG. 7 displays the Bestfit computer program align- 

mutants in this gene have been produced in a wide variety ment of the CP4 (SEQ ID NO 3) and LBAA (SEQ ID NO ^ ) 

of bacteria. Determining the identity of the donor organism EPSPS deduced amino acid sequences with the CP4 

(bacterium) aids in the isolation of Class II EPSPS gene — sequence appearing as the top sequence in the Figure. The 

such identification may be accomplished by standard micro- 25 symbols used in FIGS. 6 and 7 are the standard symbols used 

biological methods and could include Gram stain reaction, in the Bestfit computer program to designate degrees of 

growth, color of culture, and gas or acid production on similarity and identity, 
different substrates, gas cinematography analysis of methy- 

lesters of the fatty acids in the membranes of the TABLE IVA ia 

microorganism, and determination of the GC % of the 30 

genome. The identity of the donor provides information that Comparison of relatedness of epsps protein sequences 

may be used to more easily isolate the EPSPS gene. An Comparison between Qass I and Class n EPSPS 

AroA- host more closely related to the donor organism could protsein sequences 



be employed to done the EPSPS gene by complementation 
but this is not essential since complementation of the E. coli 35 
AroA mutant by the CP4 EPSPS gene was observed. In 
addition, the information on the GC content the genome may 
be used in choosing nucleotide probes — donor sources with 
higfc GC % would preferably use the CP4 EPSPS gene or 
sequences as probes and those donors with low GC would 
preferably employ those from Bacillus subtilis, for example. 40 
Relationships between different EPSPS genes 

The deduced amino acid sequences of a number of Qass 
I and the Qass n EPSPS enzymes were compared u sing the 
Bestfit computer program provided in the UWGCG package 
(Devereux et aL 1984). The degree of similarity and identity 45 
as determined using this program is reported. The degree of 
similarity/identity detennined within Qass I and Qass II 
protein sequences is remarkably high, for instance, compar- 
ing E. coli with 5. typhimurium (smiilariry/identity=93 %/ 
88%) and even comparing K coli with a plant EPSPS 50 
(Petunia hybrida; 72%/55%). These data are shown in Table 
IV. Hie comparison of sequences between Qass I and Qass 
n, however, shows a much lower degree of relatedness 

between the Classes (sirnilarity^dertt1ty=50-53%/23-30%). 

Hie display Of the Bestfit analysis for the K coli (SEQ ID « Comparison between Class 1 E PSPS protein sequences 

NO:8) and CP4 (SEQ ID N03) sequences shows the 

positions of the conserved residues and is presented in FIG. 
6. Previous analyses of EPSPS sequences had noted the high -,„«.,. 

degree of conservation of sequences of the enzymes and the fZ^fjf^i * 55 

almost invariance of sequences in two regions— the "20-35" ' 

and "95-10T regions (Gasser et aL, 1988; numbered 60 Comparison between Class n EPSPS protein sequences 

according to the Petunia EPSPS sequence)— and these 

regions are less conserved in the case of CP4 and LBAA similarity identity 

when cornpared to Class I bacterial and plant EPSPS ftnafcjIlf¥lCM ^ 43 

sequences (see HG. 6 for a comparison of the E. coh and lbaa vs. CP4 90 83 

CP4 EPSPS sequences with the K coli sequence appearing 65 pg2892 vs. CP4 90 83 

as the top sequence in the Figure). The corresponding s. aureus vs. CP4 58 34 
sequences in the CP4 Qass n EPSPS are: 



S. cerevisiae vs. CP4 


54 


30 


A nidulans vs. CP4 


50 


25 


B, napta vs. CP4 


47 


22 


A thaliana vs. CP4 


48 


22 


N. tabacum vs. CP4 


50 


24 


L esadaawn vs. CP4 


50 


24 


P. hybrida vs. CP4 


50 


23 


Z. mays vs. CP4 


48 


24 


& gaUmarwn vs. CP4 


51 


25 


& typhimurium vs. CP4 


51 


25 


& typhi vs. CP4 


51 


25 


K. pneumoniae vs. CP4 


56 


28 


K enterocoUdca vs. CP4 


53 


25 


H. influenzae vs. CP4 


53 


27 


P multocida vs. CP4 


55 


30 


A sahnoradda vs. CP4 


53 


23 


B. pertussis vs. CP4 


53 


27 


£ coh' vs. CP4 


52 


26 


R coh vs. LBAA 


52 


26 


E. coU vs. B. subtilis 


55 


29 


£ coU vs. D. nodosus 


55 


32 


EL coh vs. S. aureus 


55 


29 


ELcoU vs. Synechocysns sp. PCC6803 


53 


30 



5; 

25 

TABLE IVA ^-continued 



B. subt&s vs. CP4 59 41 

Syneckocystis sp. PCC6803 vs. CF4 62 45 



1 The EPSPS sequences compared here were obtained from the following 
references: £ colL, Rogers et aL, 1983; & typMmurium, Stalker et aL, 1985; 
Petunia hybrids, Shah et aL, 1986; A pertussis, MaskeH et aL, 1988; & 
cemvisiae, Duncan et aL, 1987, Syneckocystis sp. PCC6803, Dalla Chksa et 
aL, 1994 and a nodosus , Ahn et aL, 1994. 

2 "GAP" Program, Genetics Computer Group, (1991 X Program Manual for 
the GOG Package, Version 7, April 1991, 575 Science Drive, Madison, 
Wisconsin, USA 53711 

The relative locations of the major conserved sequences 
among Class n EPSP synthases which distinguishes mis 
group from the Class I EPSP synthases is listed below in 
Table IVB. 

TABLE IVB 



Location of Conserved Sequences in 
Class II 3BPSP Synthases 



Source 


Seq.1 1 


Seq.2 2 


Seq.3 3 


Seq.4 4 


CP4 










start 


200 


26 


173 


271 


end 


204 


29 


177 


274 


LBAA 










start 


200 


26 


173 


271 




204 


29 


177 


274 


FG2982 










start 


200 


26 


173 


273 




204 


29 


177 


276 


A subnlis 










start 


190 


17 


164 


257 




194 


20 


168 


260 


S. aureus 










start 


193 


21 


166 


261 


end 


197 


24 


170 


264 


Synechocystts sp. 










PCC6803 










start 


210 


34 


183 


278 


end 


214 


38 


187 


281 


D. nodosus 










start 


195 


22 


168 


261 


end 


199 


25 


172 


264 


min. start 


190 


17 


164 


257 


max. end 


214 


38 


187 


281 



^R-Xj-H-X^SEQ ID N037) 

2 - G^K-X r <SEQ ID NO-38) 

3 - S-A-Q-X4-K-(SEQ ID NO:39) 

4 - N-X5-T-R<SEQ ID NOt40) 

The domains of EPSP synthase sequence identified in mis 
application were determined to be those important for main- 
tenance of glyphosate resistance and productive binding of 
PEP. The information used in identifying these domains 
included sequence alignments of numerous glyphosate- 
sensitive EPSPS molecules and the three-dimensional x-ray 
structures of E. coli EPSPS (Stallings, et aL 1991) and CP4 
EPSPS. The structures are representative of a glyphosate- 
sensitrve (Le^ Class J) enzyme, and a Eaturally-occuring 
gLyphosate-tolerant (i.e., Class II) enzyme of the present 
invention. These exemplary molecules were superposed 
three-dime asionally and the results displayed on a computer 
graphics terminal. Inspection of the display allowed for 
structure-based fine-tuning of the sequence alignments of 
glyphosate-sensitive and glyphosate-resistant EPSPS mol- 
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ecules. The new sequence alignments were examined to 
determine differences between Class I and Class n EPSPS 
enzymes. Seven regions were identified and these regions 
were located in the x-ray structure of CP4 EPSPS which also 

5 contained a bound analog of the intermediate which forms 
catalyticalry between PEP and S3P. 

The structure of the CP4 EPSPS with the bound interme- 
diate analog was displayed on a computer graphics terminal 
and the seven sequence segments were examined. Important 

10 residues for glyphosate binding were identified as well as 
those residues which stabilized the conformations of those 
important residues; adjoining residues were considered nec- 
essary for maintenance of correct three-dimensional struc- 
tural motifs in the context of glyphosate- sensitive EPSPS 

15 molecules. Three of the seven domains were determined not 
to be important for glyphosate tolerance and maintenance of 
productive PEP binding. The following four primary 
domains were determined to be characteristic of Class II 
EPSPS enzymes of the present invention: 

20 -R-XrH-X 2 -E(SEQ ID N037), in winch 

X x is an uncharged polar or acidic amino add, 
X 2 is serine or threonine, 

The Arginine (R) reside at position 1 is important 
because the positive charge of its gnanidinm group 

25 destabilizes the binding of glyphosate. The Histidine 
(H) residue at position 3 stabilizes the Arginine (R) 
residue at position 4 of SEQ ID NO:40. The Glutamic 
Acid (E) residue at position 5 stabilizes the Lysine (K) 
residue at position 5 of SEQ ID N039. 

30 -G-D-K-X^SEQIDNO^g), in which 
X3 is serine or threonine, 

The Aspartic acid (D) residue at position 2 stabilizes 
the Arginine (R) residue at position 4 of SEQ ID 
NO:40. The Lysine (K) residue at position 3 is impor- 
35 tant because for productive PEP binding. 
-S-A-Q-X4-K(SEQ ID N039), in which 
. X* is any amino acid, 

The Alanine (A) residue at position 2 stabilizes the 
^ Arginine (R) residue at position 1 of SEQ ID NCfc37. 
The Serine (S) residue at position 1 and the Crlirtamine 
(Q) residue at position 3 are important for productive 
S3P binding. 
-N-X5-T-R(SEQ ID NO:40) in which 
45 Xj is any amino acid, 

The Asparagine (N) residue at position 1 and the 
Threonine (T) residue at position 3 stabilize residue X l 
at position 2 of SEQ ID NO:37. The Arginine (R) 
residue at position 4 is important because the positive 
50 charge of its guanidium group destabilizes the binding 
of glyphosate. 
Since the above sequences are only representative of the 
Class II EPSPSs which would be included within the generic 
structure of this group of EPSP synthases, the above 
55 sequences may be found within a subject EPSP synthase 
molecule within slightly more expanded regions. It is 
believed that the above-described conserved sequences 
would likely be found in the following regions of the mature 
EPSP synthases molecule: 
60 -R-X r H-X2-E-<SEQ ID NO:37) located between amino 
acids 175 and 230 of the mature EPSP synthase 
sequence; 

-G-D-K-X^SEQ ID NO:38) located between amino 
acids 5 and 55 of the mature EPSP synthase sequence; 
65 -S-A-Q-X^-KKSEQ ID N039) located between amino 
acids 150 and 200 of the mature EPSP synthase 
sequence; and 
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-N-Xs-T-R-(SEQ ID NO:40) located between amino acids the disruption of stretches of G's and C's, the elimination of 

245 and 295 of the mature EPSPS synthase sequence, potential polyadenylation sequences, and improvements in 

One difference mat may be noted between the deduced the codon usage to that used more frequently in plant genes, 

amino acid sequences of the CP4 and LB AA EPSPS proteins cou i d ItsaljL m highw egression of the CP4 EPSPS gene in 
is at position 100 where an Alanine is found in the case of 5 
the CP4 enzyme and a Glycine is found in the case of the 

LBAA enzyme. In the Class I EPSPS enzymes a Glycine is A synthetic CP4 gene was designed to change as com- 

usually found in the equivalent position, Le Glycine96 in K pletely as possible those inimical sequences discussed 

coli and K. pneumoniae and GlycinelOl in Petunia. In the above. Id summary, the gene sequence was redesigned to 
case of these three enzymes it has been reported mat 1Q eliminate as much as possible the following sequences or 

converting that Glycine to an Alanine results in an elevation sequence features (white avoiding the introduction of unnec- 

of the appKi for glyphosate and a concomitant elevation in essary restriction sites): stretches of G's and Cs of 5 or 

the appKm for PEP (Kishore et aL> 1986; Kishore and Shah, greater, and A+T rich regions (predorninantly) that could 

1988; Sost and AmrMiUWO), which as discussed above, M polyadeilvlation sites OT rna ^tMr 

m^meenzymele^ lization region. The sequence of this gene is shown in FIG. 

of lower PEP concentrations. The Glycine 100 of the LBAA „ /OT7in . —X^^ rTn ^ 1 & , . „ 

Z?£SS5£^^ andcompa^wi^^^ 

mutagenesis using the fonowir* mtaL ™ e parent Km ^for PEPfor the native and synfcetic genes 

^ uau * B ^ 20 was 11.8 and 1X7, respectively, indicating that the enzyme 

cggcaaitjOCGcxiaooggogcgcgcx^ (seq n> N034) ^Pressed from the synthetic gene was unaltered. The 

N-tenmnus of the coding sequence was mntagenized to 
and both the wild type and variant genes were expressed in place an SphI site at the ATG to permit the construction of 
E. coli in a RecA promoter expression vector (pMON17201 the CTP2-CP4 synthetic fusion for chlcroplast import The 
and pM ON 17264, respectively) and the appKm's and app- following primer was used to accomplish this mutagenesis: 

GGACGGCTTXTKXACXXn^ (SEQ ID NCk35) 

Ki's determined in crude ly sates. The data indicate that the 30 Expression of Chloroplast Directed CP4 EPSPS 
appKi(glyphosate) for the G10QA variant is elevated about The gtyphosate target in plants, the 5-enolpyruvyl- 
1 6-fold (Table V> This result is in agreement with the shikimate-3-phosphate synthase (EPSPS) enzyme, is located 
observation of the importance of this G-A change in raising in the chloroplast Many chloroplast-locahzed proteins, 
the appKi(gryphosate) in the Class I EPSPS enzymes. including EPSPS, are expressed from nuclear genes as 
However, in contrast to the results in the Class I G-A 35 precursors and are targeted to the chloroplast by a chloro- 
variants, the appKm(PEP) in the Class II (LBAA) G-A plast transit peptide (CTP) that is removed during the import 
variant is unaltered. This provides yet another distinction steps. Examples of other such chloroplast proteins include 
between the Class II and Class I EPSPS enzymes. the small sub-unit (SSU) of Rmulose-l^-bisphosphate car- 

boxylase (RUBISCO), Ferredoxin, Ferredoxin 
TABLE V oxidoreductase, the light-harvestmg-complex protein I and 

40 protein H, and Thioredoxin F. It has been demonstrated in 
vivo and in vitro mat non-chloroplast proteins may be 
targeted to the chloroplast by use of protein fusions with a 
CTP and that a CTP sequence is sufficient to target a protein 
to the chloroplast 
45 A CIP-CP4 EPSPS fusion was constructed between the 
Arabidopsis thaUana EPSPS CTP (Klee et aL , 1987) and the 
CP4 EPSPS coding sequences. The Arabidopsis CTP was 

©ranae of PEP: 2-40 uM . . « . „. * « . * „ _ . _ 

^ofglyj^Hio^^^ engineered by site^ected mutagenesis to place a SphI 

restriction site at the CTP processing site. This mutagenesis 
The LBAA G100A variant, by virtue of its superior kinetic so replaced the Glu-Lys at this location with Cys-Met. The 
properties, should be capable of inmarting improved in sequence of this CTP, designated as CTP2 (SEQ ID NO: 10), 
planta glyphosate tolerance. is shown in FIG. 9. Hie N-terminus of the CP4 EPSPS gene 

Modification and Resynthesis of the Agrobacterium sp. was modified to place a SphI site that spans the Met codon. 
strain CP4 EPSPS Gene Sequence The second codon was converted to one for leucine in this 

The EPSPS gene from Agrobacterium sp. strain CP4 55 step also. This change had no apparent effect on the in vivo 
contains sequences that could be inimical to high expression activity of CP4 EPSPS in £1 coli as judged by rate of 
of the gene in plants. These sequences include potential complementation of the aroA allele. This modified 
polyadenylation sites mat are often and A+T rich, a higher N-terminus was then combined with the Sad C-termtnus 
G+C % than mat frequently found in plant genes (63% and cloned downstream of the CTP2 sequences. The CTP2- 
versus -50%), concentrated stretches of G and C residues, 60 CP4 EPSPS fusion was cloned into pBlueScript KS(+). This 
and codons that are not used frequently in plant genes. The vector may be transcribed in vitro using the T7 polymerase 
high G+C % in the CP4 EPSPS gene has a number of and the RNA translated with ^S-Memionine to provide 
potential consequences including the following: a higher material that may be evaluated for import into chloroplasts 
usage of G or C than mat found in plant genes in the third isolated from Lactuca sativa using the methods described 
position in codons, and the potential to form strong hair-pin 65 hereinafter (della-Cioppa et aL, 1986, 1987). This template 
structures that may affect expression or stability of the RNA. was transcribed in vitro using 17 polymerase and the 35 S- 
The reduction in the G+C content of the CP4 EPSPS gene, metMonine-labeled CTP2-CP4 EPSPS material was shown 





appKm(PEP) 


^pIQ(grypbosatc) 


Lysatc prepared from: 






K cofc/pMON 17201 (wild 


53 


28 pM* 


type) 






K co/tfpMON17264 


55 pM 


459 mM# 


(G100A variant) 
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to import into chloroplasts with an efficiency comparable to are removed at various times and fractionated over 100 ul 

mat for the control Petunia EPSPS (controls 35 ^ labeled silicone-oil gradients (in 150 ul polyethylene tubes) by 

PreEPSPS |pMON6140; ddla-Qoppa et aL, 19861). centrifugation at ll,000x g for 30 seconds. Under these 

In another example the Arabidopsis EPSPS CTP, desig- conditions, the intact chloroplasts form a pellet under the 

nated as CTP3, was fused to the CP4 EPSPS through an 5 silicone-oil layer and the incubation medium (containing the 

EcoRI site. The sequence of this CTP3 (SEQ ID NO: 12) is reticulocyte lysate) float? on the surface. After 

shown in FIG. 10. An EcoRI site was introduced into the centrifugation, the silicone-oil gradients are immediately 

Arabidopsis EPSPS mature region around amino acid 27, frozen in dry ice. The chloxoplast pellet is then resuspended 

replacing the sequence -Arg-Ala-Leu-Leu- with -Arg-Ile- m 50-100 ul of lysis buffer (10 mM Hepes-KOH pH 75, 1 

Leu-Leu- in the process. The primer of the following 10 ^ PMSR 1 bc *zaimdine, 5 mM e-arnino-n-caproic 

sequence was used to modify the N-terrninus of the CP4 30 a P rotinil1 ) ^ cenrrifuged at 15,000* g 

EPSPS gene to add an EcoRI site to effect the fusion to the *»» ^ u ***> Pf* * e t^^d membranes. The clear 

supernatant (stromal proteins) from this spin, and an aliquot 

ctp3.<3Gaagac^ of * e «^ocyte lysate incubation medium from each 

(seq id NO.36) (the EcoRI site is underlined 1S expmiiient, are maed with an equal volume of 

15 2xSDS-PAGE sample buffer for electrophoresis (I^ernmli, 
1970). 

This CIP3-CP4 EPSPS fusion was also cloned into the SDS-PAGE is carried out according to Laemmli (1970) in 

pBlueScript vector and the T7 expressed fusion was found 3-17% (w/v) acrylamide slab gels (60 mmxl.5 mm) with 

to also import into chloroplasts with an efficiency compa- 3% (w/v) acrylamide stacking gels (5 mmxl.5 mm) . The gel 

rable to that for the control Petunia EPSPS (pMON6140). 20 is fixed for 20-30 rain in a solution with 40% methanol and 

A related series of OTPs, designated as CTP4 (Sphl) and 10% acetic acid. Then, the gel is soaked in EN 3 HANCE™ 

CTP5 (EcoRI), based on the Petunia EPSPS CTP and gene (DuPont) for 20-30 minutes, followed by drying the gel on 

were also fused to the Sphl- and EcoRI-modified CP4 a gel dryer. The gel is imaged by autoradiography, using an 

EPSPS gene sequences. The Sphl site was added by site- intensifying screen and an overnight exposure to determine 

directed mutagenesis to place this restriction site (and 25 whether the CP4 EPSPS is imported into the isolated chlo- 

change the amino acid sequence to -Cys-Met-) at the chlo- roplasts. 

roplast processing site. All of the CTP-CP4 EPSPS fusions Plant Transformation 

were shown to import into chloroplasts with approximately Plants which can be made glyphosate-tolerant by practice 

equal efficiency. The CIP4 (SEQ ID NO:14) and CTP5 of the present invention include, but are not limited to, 

(SEQ ID NO: 16) sequences are shown in FIGS. 11 and 12. 30 soybean, cotton, corn, canola, oil seed rape, flax, sugarbeet, 

A CTP2-LBAA EPSPS fusion was also constructed fol- sunflower, potato, tobacco, tomato, wheat, rice, alfalfa and 

lowing the modification of the N-terrninus of the LBAA lettuce as well as various tree, nut and vine species. 

EPSPS gene by the addition of a Sphl site. This fusion was A double-stranded DNAmolecule of the present invention 

also found to be imported efficiently into chloroplasts. ("chimeric gene") can be inserted into the genome of a plant 

By similar approaches, the CTP2-CP4 EPSPS and the 35 by any suitable method. Suitable plant transformation vec- 

CIP4-CP4 EPSPS fusion have also been shown to import tors include those derived from a Ti plasmid of Agrobacte- 

efficientry into chloroplasts prepared from the leaf sheaths of rium turnefaciens, as well as those disclosed, eg., by 

corn. These results indicate that these CTP-CP4 fusions Herrera-Estrella (1983), Beyart (1984), Klee (1985) and 

could also provide useful genes to impart glyphosate toler- EPO publication 120,5 16 (Schflperoort et aL). In addition to 

ance in monocot species. 40 plant transformation vectors derived from the li or root- 

The use of CTP2 or CIP4 is preferred because these inducing (Ri) plasmids of Agrobacterium, alternative meth- 

transit peptide constructions yield mature EPSPS enzymes ods can be used to insert the DNA constructs of this 

upon import into the chloroplat which are closer in compo invention into plant cells. Such methods may involve, for 

sition to the native EPSPSs not containing a transit peptide example, the use of liposomes, electroporation, chemicals 

signal Those skilled in the art will recognize that various 45 that increase free DNA uptake, free DNA delivery via 

chimeric constructs can be made which utilize the function- rnicroprojectile bombardment, and transformation using 

ality of a particular CTP to import a Class U EPSPS enzyme viruses or pollen. 

into the plant cell chloroplast The chloroplast import of the Class II EPSPS Plant transformation vectors 

Class II EPSPS can be determined using the following assay. Class II EPSPS DNA sequences may be engineered into 

Chloroplast Uptake Assay 50 vectors capable of transforming plants by using known 

Intact chloroplasts are isolated from lettuce (Latuca techniques. The following description is meant to be iflus- 

sativa, var. longifolia) by centrifugation in PercoU/ficoll trative and not to be read in a limiting sense. One of ordinary 

gradients as modified from Bartlett et aL, (1982). The final skill in the art would know mat other plasmids, vectors, 

pellet of intact chloroplasts is suspended in 0.5 ml of sterile markers, promoters, etc. would be used with suitable results. 

330mM soroitolm50niMHepes-KOH,pH7.7,assayedfor 55 The CTP2-CP4 EPSPS fusion was cloned as a BglH-EcoRI 

chlorophyll (Arnon, 1949), and adjusted to the final chlo- fragment into the plant vector pMON979 (described below) 

rophyll concentration of 4 mg/ml (using sorbitol/Hepes). to form pMON17110, a map of which is (resented in FIG. 

The yield of intact chloroplasts from a single head of lettuce 13. In this vector the CP4 gene is expressed from the 

is 3-6 mg chlorophyll. enhanced CaMV35S promoter (E35S; Kay et aL 1987). A 

A typical 300 ul uptake experiment contained 5 mM ATP, 60 FMV35S promoter construct (pMON17116) was completed 

83 mM unlabeled methionine, 322 mM sorbitol, 583 mM in the following way: The SaH-Notl and the Notl-BglH 

Hepes-KOH (pH 8.0), 50 ul reticulocyte lysate translation fragments from pMON979 containing the Spc/AAC(3>nF 

products, and intact chloroplasts from L. sadya (200 ug oriV and the pBR322/Right Border/NOS 37CP4 EPSPS 

chlorophyll). The uptake mixture is gently rocked at room gene segment from pMON17110 were ligaled with the 

temperature (in 10*75 mm glass tubes) directly in front of 65 XhoI-BglH FMV35S promoter fragment from pMON981. 

a fiber optic iflurninator set at maxhnum light intensity (150 These vectors were introduced into tobacco, cotton and 

Watt bulb). Aliquot samples of the uptake mix (about 5 0 ul) canola. 
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A series of vectors was also completed in the vector The 0.6 kb Sspl fragment containing the FMV35S promoter 

pMON977 in which the CP4 EPSPS gene, the CTP2-CP4 (FIG. 1) was engineered to place suitable cloning sites 

EPSPS fusion, and the CTP3-CP4 fusion were cloned as downstream of the transcriptional start site. The CTP2- 

BgLH-SacI fragments to form pMON17124, pMON17119, CP4syn gene fusion was introduced into plant expression 

and pMON17120, respectively- These plasmids were intro- 5 vectors (including pMON981, to form pMON17131; FIG. 

duced into tobacca A pMON977 derivative containing the 14) and transformed into tobacco, canola, potato, tomato, 

CTP2-LB AA EPSPS gene was also completed sugarbeet, cotton, lettuce, cucumber, oil seed rape, poplar, 

(pMON17206) and introduced into tobacca and Arabidopsis. 

The pMON979 plant transformation/expression vector The plant vector containing the Class II EPSPS gene may 

was derived £rompMON886 (described below) by replacing 10 be mobilized into any suitable Agrobacterium strain for 

the neomycin phosphotransferase typeH (KAN) gene in transformation of the desired plant species. The plant vector 

pMON886 with the 0.89 kb fragment containing the bacte- may be mobilized into an ABI Agrobacterium strain. A 

rial gentarnidn-3-N-ac»ryllransf erase type HI (AACC3HH) suitable ABI strain is the A208 Agrobacterium tumefaciens 

gene(HayfordetaL, 1988). The chimeric P-35S/AA(3)-ffl/ carrying the disarmed H plasmid pTiC58 (pMP90RK) 

NOS 3* gene encodes gentamicin resistance which permits is (Koncz and Schell, 1986). The Ti plasmid does not carry the 

selection of transformed plant cells. pMON979 also contains T-DNA phytohormone genes and the strain is therefore 

a 0.95 kb expression cassette consisting of the enhanced unable to cause the crown gall disease. Mating of the plant 

CaMV 35S promoter (Kay et aL, 1987), several unique vector into ABI was done by the triparental conjugation 

restriction sites, and the NOS 3' end (P-En-CaMV35SfNOS system using the helper plasmid pRK2013 (Ditta et aL, 

3'). The rest of the pMON979DNA segments are exactly the 20 1980). When the plant tissue is incubated with the ABI- 

same as in pMON886. ::plant vector conjugate, the vector is transferred to the plant 

Plasmid pMON886 is made up of die following segments cells by the vir functions encoded by the disarmed pHC58 

of DNA. Hie first is a 0.93 kb Aval to engineered-EcoRV plasmid. The vector opens at the T-DNAright border region, 

fragment isolated from transposes Tn7 mat encodes bacte- and the entire plant vector sequence may be inserted into the 

rial spectmomycin/streptomydn resistance (Spc/Str), which 25 host plant chromosome. The pHC58 Ti plasmid does not 

is a determinant for selection in E. coli and Agrobacterium transfer to the plant cells but remains in the Agrobacterium. 

tumefaciens. This is joined to the 1.61 kb segment of DNA Class II EPSPS free DNA vectors 

encoding a chimeric kanamycin resistance which permits Class n EPSPS genes may also be introduced into plants 

selection of transformed plant cells. The chimeric gene through direct delivery methods. A number of direct delivery 

(P-35S/KANfNOS 3 1 ) consists of the cauliflower mosaic 30 vectors were completed for the CP4 EPSPS gene. The vector 

virus (CaMV) 35S promoter, the neomycin phosphotrans- pMON13640, a map of which is presented in FIG. 15, is 

ferase typeH (KAN) gene, and the 3-nontranslated region of described here. The plasmid vector is based on a pUC 

the nopaline synthase gene (NOS 3*) (Fraley et aL, 1983). plasmid (Vieira and Messing, 1987) containing, in this case, 

The next segment is the 0.75 kb oriV containing the origin the nptH gene (kanamycin resistance; KAN) rromTn9Q3 to 

of replication from the RK2 plasmid It is joined to the 3.1 35 provide a selectable marker in & colL The CTP4-EPSPS 

Id) Sail to Pvul segment of pBR322 (ori322) which provides gene fusion is expressed from the P-FMV35S promoter and 

the origin of replication for maintenance in & coli and the contains the NOS 3* pofyadenylation sequence fragment and 

bom site for the conjugational transfer into the Agrobacte- from a second cassette consisting of the E35S promoter, the 

Hum tumefaciens cells. The next segment is the 036 kb Pvul CTP4-CP4 gene fusion and the NOS 3* sequences. The 

to Bell from pTTT37 mat carries the nopahne-type T-DNA 40 scoreable GUS marker gene (Jefferson et aL, 1987) is 

right border (Fraley et al., 1985). expressed from the mannopine synthase promoter (P-MAS; 

The pMON977 vector is the same as pMON981 except Velten et aL, 1984) and the soybean 7S storage protein gene 

for the presence of the P-En-CaMV35S promoter in place of 3' sequences (Schuler et aL, 1982). Similar plasmids could 

the FMV35S promoter (see below). also be made in which CTP-CP4 EPSPS fusions are 

The pMON981 plasmid contains the following DNA 45 expressed from the enhanced CaMV35S promoter or other 

segments: the 0.93 kb fragment isolated from transposon plant promoters. Other vectors could be made that are 

Tn7 encoding bacterial spectmomycin/streptomycin resis- suitable for free DNA delivery into plants and such are 

tance [Spc/Str, a determinant for selection in K coli and within the skill of the art and contemplated to be within the 

Agrobacterium tumefaciens (Fling et aL, 1985)]; the cmV scope of this disclosure, 

meric kanamycin resistance gene engineered for plant so Hastid transformation: 

expression to allow selection of the transformed tissue, While transformation of the nuclear genome of plants is 

consisting of the 035 kb cauliflower mosaic virus 35S much more developed at this time, a rapidly advancing 

promoter (P-35S) (Odell et aL, 1985), the 0.83 kb neomycin alternative is the transformation of plant organelles. The 

phosphotransferase typeH gene (KAN), and the 0.26 kb transformation of plastidsof land plants and the regeneration 

3-nontran slated region of the nopaline synthase gene (NOS 55 of stable transformants has been demonstrated (Svab et al., 

3') (Fraley et aL, 1983); the 0.75 kb origin of replication 1990; Maliga et aL, 1993). Transformants are selected, 

from the RK2 plasmid (oriV) (Stalker et aL, 1981); the 3.1 following double cross-over events into the plastid genome, 

idb Sail to Pvul segment of pBR322 which provides the on the basis of resistance to spectinomycin conferred 

origin of replication for maintenance in £ coli (ori-322) and through rRNA changes or through the introduction of an 

the bom site for the conjugational transfer into the Agra- 60 aminoglycoside 3 °-adenyltransf erase gene (Svab et aL, 

bacterium tumefaciens cells, and the 036 kb Pvul to Bell 1990; Svab and Maliga, 1993), or resistance to kanamycin 

fragment from the pTTT37 plasmid containing the nopaline- through the neomycin phosphotransferase NptH (Carrer et 

type T-DNA right border region (Fraley et aL, 1985). The aL, 1993). DNA is introduced by biolistic means (Svab et al, 

expression cassette consists of the 0.6 kb 35S promoter from 1990; Maliga et aL, 1993) or by using polyethylene glycol 

the figwort mosaic virus (P-FMV35S) (Gowda et aL, 1989) 65 (O'Neill et aL, 1993). This transformation route results in 

and the 0.7 kb 3* non-translated region of the pea rbcS-E9 the production of 500-10,000 copies of the mtroduced 

gene(E93')(CoruzzietaL, 1984, and Morelli et aL, 1985). sequence per cell and high levels of expression of the 
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introduced gene have been reported (Carrer et aL, 1993; 
Maliga et aL, 1993). Hie use of plastid transfoimation offers 
the advantages of not requiring the chloroplast transit pep- 
tide signal sequence to result in the localization of the 
heterologous Class II EPSPS in the chloroplast and the 
potential to have many copies of the heterologous plant- 
expressible Class II EPSPS gene in each plant cell since at 
least one copy of the gene would be in each plastid of the 
celL 

Plant Regeneration 

When expression of the Class II EPSPS gene is achieved 
in transformed cells (or protoplasts), the cells (or 
protoplasts) are regenerated into whole plants. Choice of 
methodology for the regeneration step is not critical, with 
suitable protocols being available for hosts from Legumi- 
nosae (alfalfa, soybean, clover; etc.), Umbelliferae (carrot, 
celery, parsnip), Crociferae (cabbage, radish, rapeseed, etc), 
Cucurbitaceae (melons and cucumber), Gramineae (wheat, 
rice, corn, etc.), Solanaceae (potato, tobacco, tomato, 
peppers), various floral crops as well as various trees such as 
poplar or apple, nut crops or vine plants such as grapes. See, 
e.g., Ammirato, 1984; Shimamoto, 1989; Fromm, 1990; 
Vasu\ 1990. 

The following examples are provided to better elucidate 
the practice of the present invention and should not be 
interpreted in any way to limit the scope of the present 
invention. Those skilled in the art will recognize that various 
modifications, truncations, etc can be made to the methods 
and genes described herein while not departing from the 
spirit and scope of the present invention. 

In the examples that follow, EPSPS activity in plants is 
assayed by the following method. Tissue samples were 
collected and immediately frozen in liquid nitrogen. One 
gram of young leaf tissue was frozen in a mortar with liquid 
nitrogen and ground to a fine powder with a pestle. The 
powder was then transferred to a second mortar, extraction 
buffer was added (1 ml/gram), and the sample was ground 
for an additional 45 seconds. The extraction buffer for 
canola consists of 100 mM Ttis, 1 mM EDTA, 10% glycerol, 
5 mM DTT, ImMBAM, 5 mM ascorbate, 1.0 mg/ml BSA, 
pH 7.5 (4° C). The extraction buffer for tobacco consists of 
100 mM Iris, 10 mM EDTA, 35 mM Kd, 20% glycerol, 5 
mM DTT, 1 mM BAM, 5 mM ascorbate, 1.0 mg/ml BSA, 
pH 7.5 (4° C). The mixture was transferred to a microfuge 
tube and cenrrifuged for 5 minutes. The resulting superna- 
tants were desalted on spin G-50 (Pharmacia) columns, 
previously equilibrated with extraction buffer (without 
BSA), in 0.25 ml aliquots. The desalted extracts were 
assayed for EPSP synthase activity by radioactive HPLC 
assay, frotein concentrations in samples were determined by 
the BioRad microprotein assay with BSA as the standard. 

Protein concentrations were determined using the BioRad 
Microprotein method BSA was used to generate a standard 
curve ranging from 2-24 fig. Either 800 ul of standard or 
diluted sample was mixed with 200 ul of concentrated 
BioRad Bradford reagent The samples were vortexed and 
read at A(595) after -5 minutes and compared to the 
standard curve 

EPSPS enzyme assays contained HEPES (50 mM), 
shikimate-3-phosphate (2 mM), NH 4 molybdate (0.1 mM) 
and KF (5 mM), with or without glyphosate (03 or 1.0 mM). 
The assay mix (30 ul) and plant extract (10 ul) were 
preincubated for 1 minute at 25° C and the reactions were 
initiated by adding 14 C-PEP (1 mM). The reactions were 
quenched after 3 minutes with 50 ul of 90% EtOH/O.lM 
HOAc, pH 4 5. The samples were spun at 6000 rpm and the 
resulting supernatant* were analyzed for 14 C-EPSP produc- 
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tion by HPLC Percent resistant EPSPS is calculated from 
the EPSPS activities with and without glyphosate. 

The percent conversion of 14 C labeled PEP to 14 C EPSP 
was determined by HPLC radioassay using a C18 guard 

5 column (Brownlee) and an AX 100 HPLC column (0.4x25 
cm, Synchropak) with 0.28M isocratic potassium phosphate 
eluant, pH 6.5, at 1 mlAnin. Initial velocities were calculated 
by multiplying fractional turnover per unit time by the initial 
concentration of the labeled substrate (1 mM). The assay 

10 was linear with time up to -3 minutes and 30% turnover to 
EPSPS. Samples were dilated with 10 mM Iris, 10% 
glycerol, 10 mM DTT, pH 7.5 (4° C) if necessary to obtain 
results within the linear range. 

In these assays DL-dithiotheitol (DTT), benzamidine 

is (BAM), and bovine serum albumin (BSA, essentially globu- 
lin free) were obtained from Sigma. Phosphoenolpyruvate 
(PEP) was from Boehringer Mannheim and phosphoenol- 
[l- 14 C]pyruvate (28 mO/mmol) was from Amersham. 

20 EXAMPLES 

Example 1 

Transformed tobacco plants have been generated with a 
number of the Class n EPSPS gene vectors containing the 

25 CP4 EPSPS DNA sequence as described above with suitable 
expression of the EPSPS. These transformed plants exhibit 
glyphosate tolerance imparted by the Class II CP4 EPSPS. 
Transformation of tobacco employs the tobacco leaf disc 

30 transformation protocol which utilizes healthy leaf tissue 
about 1 month old* After a 15-20 minutes surface steriliza- 
tion with 10% Qoroxplus a surfactant, the leaves are rinsed 
3 times in sterile water. Using a sterile paper punch, leaf 
discs are punched and placed upside down on MS 104 media 

35 (MS salts 43 g/1, sucrose 30 gfl, B5 vitamins 500x2 mlA, 
NAA 0.1 mg/1, and BA 1.0 mg/1) for a 1 day preculture. 

The discs are then inoculated with an overnight culture of 
a disarmed Agrobacterium ABI strain containing the subject 
vector that had been diluted 1/5 (Le.: about 0.6 OD). The 

40 inoculation is done by placing the discs in centrifuge tubes 
with the culture. After 30 to 60 seconds, the liquid is drained 
off and the discs were blotted between sterile filter paper. 
The discs are then placed upside down on MS 104 feeder 
plates with a filter disc to co-culture. 

45 After 2r-3 days of co-culture, the discs are transferred, still 
upside down, to selection plates with MS104 media. After 
2-3 weeks, callus tissue formed, and individual clumps are 
separated from the leaf discs. Shoots are cleanly cut from the 
callus when they are large enough to be distinguished from 

50 stems. The shoots are placed on hormone-free rooting media 
(MSO: MS salts 43 gA, sucrose 30 gfl, and B5 vitamins 
500x2 ml/1) with selection for the appropriate antibiotic 
resistance. Root formation occurred in 1-2 weeks. Any leaf 
callus assays are preferably done on rooted shoots while still 

55 sterile. Rooted shoots are then placed in soil and kept in a 
high humidity environment (Le.: plastic containers or bags). 
The shoots are hardened off by gradually exposing them to 
ambient humidity conditions. 

Expression of CP4 EPSPS protein in transformed plants 
60 Tobacco cells were transformed with a number of plant 
vectors containing the native CP4 EPSPS gene, and using 
different promoters and/or OTP's. Preliminary evidence for 
expression of the gene was given by the ability of the leaf 
tissue from antibiotic selected transformed shoots to recaHns 
65 on glyphosate. In some cases, glyphosate-tolerant callus was 
selected directly following transformation. The level of 
expression of the CP4 EPSPS was determined by the level 
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of glyphosate-tolerant EPSPS activity (assayed in the pres- 
ence of 0.5 mM glyphosate) or by Western blot analysis 
using a goat anti-CP4 EPSPS antibody. The Western blots 
were quantitated by densitometer tracing and comparison to 
a standard carve established using purified CP4 EPSPS. 
These data are presented as % soluble leaf protein. The data 
from a number of transformed plant lines and transformation 
vectors are presented in Table VI below. 

TABLE VI 

Expression of CP4 EPSPS in transftrmrffri tobacco tissue 

CP4 EPSPS ** 



Vector 


Plant # 


(% leaf protein) 


pMON17110 


25313 


002 


PMON17110 


25329 


0.04 


pMONl7H6 


25095 


0j02 


P MON17119 


25106 


O09 


PMON17119 


25762 


O09 


pMON17119 


25767 


003 



•♦Gtyphosate-toleiant EPSPS activity was also demonstrated in leaf extracts 



Glyphosate tolerance has also been demonstrated at the 
whole plant level in transformed tobacco plants. In tobacco, 
R^ transformants of CIP2-CP4 EPSPS were sprayed at OA 
lb/acre (0.448 kg/hectare), a rate sufficient to Mil control 
non-transformed tobacco plants corresponding to a rating of 
3, 1 and 0 at days 7, 14 and 28, respectively, and were 
analyzed vegetativery and reproductively (Table VH). 

TABLE VII 

Glyphosate tolerance hi tobacco CP4 trapsfonnants* 
Score** 



Vegetative 

Vector/Plant # day 7 day 14 day 28 Fertile 

pMON17HO/25313 6 4 2 no 

pM ON 171 10/25329 9 10 10 yes 

pMON17119/25106 9 9 10 yes 

♦Spray rate = 0.4 Ih/acrc (0.448 kg/hectare) 

♦•Plants are evaluated on a numerical scoring system of 0-10 where a 
vegetative score of 10 represents no damage relative to nonsprayed controls 
and 0 icpiebeiila a dead plant. Reproductive scores (Fertile) are determined at 
28 days after spraying and are evaluated as to whether or not the plant is 
fertile. 

Example 2A 

Canola plants were transformed with the pMON17110, 
pMON17U6 t and pMON17131 vectors and a number of 
plant lines of the transformed canola were obtained which 
exhibit glyphosate tolerance. 
Plant Material 

Seedlings of Brassica napus cv We star were established 
in 2 inch (~5 cm) pots containing Metro Mix 350. They were 
grown in a growth chamber at 24° C, 16/8 hour 
photoperiod, light intensity of 400 uEm~ 2 sec _1 (HID lamps). 
They were fertilized with Peters 20-10-20 General Purpose 
Special. After 2Yi weeks they were transplanted to 6 inch 
(-15 cm) pots and grown in a growth chamber at 15710° C 
day/night temperature, 16/8 hour photoperiod, light intensity 
of 800 uEm^sec" 1 (HID lamps). They were fertilized with 
Peters 15-30-15 Hi-Phos Special 
Transformation/Selection/Regeneration 

Four terminal internodes from plants just prior to bolting 
or in the process of bolting but before flowering were 
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removed and surfaced sterilized in 70% v/v ethanol for 1 
minute, 2% w/v sodium hypochlorite for 20 minutes and 
rinsed 3 times with sterile deionized water. Stems with 
leaves attached could be refrigerated in moist plastic bags 

s for up to 72 hours prior to sterilization. Six to seven stem 
segments were cut into 5 mm discs with a Redco Vegetable 
Slicer 200 maintaining orientation of basal end. 

The Agrobacterium was grown overnight on a rotator at 
24° C. in 2 mis of Luria Broth containing 50 mg/1 

10 kanamycin, 24 mg/1 chloramphenicol and 100 mg/1 specti- 
nomycin. A 1:10 dilution was made in MS (Murasinge and 
Skoog) media giving approximately 9x10 s cells per mL This 
was confirmed with optical density readings at 660 mu. The 
stem discs (explants) were inoculated with 1.0 ml of Agro- 

is bacterium and the excess was aspirated from the explants. 
The explants were placed basal side down in petri plates 
containing 1/lOx standard MS salts, B5 vitamins, 3% 
sucrose, 0.8% agar, pH 5.7, 1.0 mg/1 6-benzyladenine (BA). 
The plates were layered with 1.5 ml of media containing MS 

20 salts, B5 vitamins, 3% sucrose, pH 5.7, 4.0 mg/1 
p-chlorbphenoxyacetic acid, 0.005 mg/1 kinetin and covered 
with sterile filter paper. 

Following a 2 to 3 day co-culture, the explants were 
transferred to deep dish petri plates containing MS salts, B5 

25 vitamins, 3% sucrose, 0.8% agar, pH 5.7, 1 mg/1 BA, 500 
mg/1 carbemcillin, 50 mg/1 cefotaxime, 200 mg/1 kanamycin 
or 175 mg/1 gentamicin for selection. Seven explants were 
placed on each plate. After 3 weeks they were transferred to 
fresh media, 5 explants per plate. The explants were cultured 

30 in a growth room at 25° C, continuous light (Cool White). 
Expression Assay 

After 3 weeks shoots were excised from the explants. Leaf 
recall using assays were initiated to confirm modification of 
shoots. Three tiny pieces of leaf tissue were placed on 

35 recallusing media containing MS salts, B5 vitamins, 3% 
sucrose, 0.8% agar, pH 5.7, 5.0 mg/1 BA, 0.5 mg/1 naph- 
thalene acetic add (NAA), 500 mg/1 carbemcillin, 50 mg/1 
cefotaxime and 200 mg/1 kanamycin or gentamicin or 0.5 
mM glyphosate. The leaf assays were incubated in a growth 

40 room under the same conditions as explant culture. After 3 
weeks the leaf recallusing assays were scored for herbicide 
tolerance (callus or green leaf tissue) or sensitivity 
(bleaching). 
Transplantation 

45 At the time of excision, the shoot steins were dipped in 
Rootone® and placed in 2 inch (~5 cm) pots containing 
Metro-Mix 350 and placed in a closed humid environment 
They were placed in a growth chamber at 24° C, 16/8 hour 
photoperiod, 400 uEm" 1 sec~ 2 (HID lamps) for a hardening- 

so off period of approximately 3 weeks. 

The seed harvested from Replants is R t seed which gives 
rise to R x plants. To evaluate the glyphosate tolerance of an 
R„ plant, its progeny are evaluated. Because an R 0 plant is 
assumed to be hemizygous at each insert location, selling 

55 results in maximum genotypic segregation in the R x . 
Because each insert acts as a dominant allele, in the absence 
of linkage and assuming only one hemizygous insert is 
required for tolerance expression, one insert would segre- 
gate 3:1, two inserts, 15:1. three inserts 63:1, eta Therefore, 

60 relatively few R x plants need be grown to find at least one 
resistant phenotype. 

Seed from an plant is harvested, threshed, and dried 
before planting in a glyphosate spray test Various tech- 
niques have been used to grow the plants for R x spray 

65 evaluations. Tests are conducted in both greenhouses and 
growth chambers. Two planting systems are used; -10 cm 
pots or plant trays containing 32 or 36 cells. Soil used for 
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planting is either Metro 350 plus three types of slow release 
fertilizer or plant Metro 350. Irrigation is either overhead in 
greenhouses or sub-irrigation in growth chambers. Fertilizer 
is applied as required in irrigation water. Temperature 
regimes appropriate for canola were maintained. A sixteen 
hour photoperiod was maintained. At the onset of flowering, 
plants are transplanted to -15 cm pots for seed production. 

A spray "batch" consists of several sets of R x progenies all 
sprayed on the same date. Some batches may also include 
evaluations of other than R x plants. Each batch also includes 
sprayed and unsprayed non-transgemc genotypes represent- 
ing the genotypes in the particular batch which were puta- 
tively transformed. Also included in a batch is one or more 
non-segregating transformed genotypes previously identi- 
fied as having some resistance. 

Two-six plants from each individual progeny are not 
sprayed and serve as controls to compare and measure the 
gLyphosate tolerance, as well as to assess any variability not 
induced by the gryphosate. When the other plants reach the 
2-4 leaf stage, usually 10 to 20 days after planting, glypho- 20 
sate is applied at rates varying from 0-28 to 1.12 kg/ha, 
depending on objectives of the study. Low rate technology 
using low volumes has been adopted. A laboratory track 
sprayer has been calibrated to deliver a rate equivalent to 
field conditions. 

A scale of 0 to 10 is used to rate the sprayed plants for 
vegetative resistance. The scale is relative to the unsprayed 
plants from the same plant A 0 is death, while a 10 
represents no visible difference from the unsprayed plant A 
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TABLE Vm-continued 



Regression of CP4 EPSPS in transformed Canola plants 



Plant # 



°h resis t ant EPSPS activity 
of Leaf extract 
(at 0.5 mM gtyphosate) 



10 



15 



PMON17U0 


252 


29* 


pMON17110 


350 


49 


pMON171l6 


40 


25 


PMON17116 


99 


87 


pMON17116 


175 


94 


PMON17U6 


178 


43 


PMON17116 


182 


18 


pMON17U6 


252 


69 


PMON17116 


298 


44* 


pMON17U6 


332 


89 


PMON17H6 


383 


97 


pMON17U6 


395 


52 



♦assayed in the presence of 1 j0 mM gtyphosate 



Gryphosate tolerance in Class n EPSPS 

(pMON17110 = P-E35S; pMON17116 = P-FMV35S; Rl plants; 
Spray rate = 0.56 fcg/ha) 



Rj transfonnants of canola were then grown in a growth 
chamber and sprayed with glypho sate at 0.56 kg/ha 
25 (kflogram/hectarc) and rated vegetatively. These results are 
shown in Table IXA-IXC It is to be noted mat expression 
of gLyphosate resistant EPSPS in all tissues is preferred to 
observe optimal gLyphosate tolerance phenotype in these 
transgenic plants. In the Tables below, only expression 
higher number between 0 and 10 represents progressively 30 results obtained with leaf tissue are described, 
less damage as compared to the unsprayed plant Plants are 

scored at 7, 14, and 28 days after treatment (DAT), or until TABLE IXA 

bolting, and a line is given the average score of the sprayed 
plants within an R„ plant family. 

Six integers are used to qualitatively describe the degree 35 
of reproductive damage from gLyphosate: 

0: No floral bud development 
2: Floral buds present, but aborted prior to opening 
4: Flowers open, but no anthers, or anthers fail to extrude 40 

past petals 
6: Sterile anthers 
8: Partially sterile anthers 
10: Fully fertile flowers 

Plants are scored using this scale at or shortly after 45 
initiation of flowering, depending on the rate of floral 
structure development 
Expression of EPSPS in Canola 

After the 3 week period, the transformed canola plants so 
were assayed for the presence of gryphosate-tolerant EPSPS 
activity (assayed in the presence of gryphosate at 0.5 mM). 

The results are shown in Table VOL TABLE KB 



Vegetative 
Scan** 



\fectar/Pkmt No. 


EPSPS* 


day 7 


day 14 


Control Wester 


0 


5 


3 


pMON1711CWl 


47 


6 


7 


pMON17110/71 


82 


6 


7 


pMON17110/177 


85 


9 


10 


pMON17116/40 


25 


9 


9 


pMON17116799 


87 


9 


10 


pMONl71167175 


94 


9 


10 


PM0N171167178 


43 


6 


3 


pMON17116U82 


18 


9 


10 


pMON17U6/383 


97 


9 


10 



TABLE VDI 



Expression of CP4 EPSPS in transformed Canola plants 



Plant # 



% resistant EPSPS activity 
of Leaf extract 
(at OS mM gryphosate) 



Vector Control 
pMONmiO 
PMON17110 
pMON17110 
PMON17110 
PMON17110 
pMON17110 



41 

52 
71 
104 
172 
177 



0 
47 
28 
82 
75 
84 
85 



55 



60 



65 



Gryphosate tolerance in Class II EPSPS 
c a n ola K l transfonnants 



Vector/Plant No. 


A^getative score** 
day 14 


Reproductive score 
day 28 


17131/78 


10 


10 


17131/102 


9 


10 


17131/115 


9 


10 


17131/116 


9 


10 


17131/157 


9 


10 


17131/169 


10 


10 


17131/255 


10 


10 


control Westar 


1 


0 
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The first segment is the 0.45 kb Oal-Dral fragment from the 
TABLE DCC pTil5955 octopine Ti plasmid which contains the T-DNA 

left border region (Barker et aL, 1983). It is joined to the 0.75 



Giypbosatc tote^mCb^ iepsps kb origin of replication (oriV) derived from the broad-host 

canoia tamsformants 3 range plasmid RK2 (Stalker et aL, 1981). The next segment 

is the 3.1 kb Sall-Pvul segment of pBR^ which provides 
the origin of replication for maintenance in E. coli and the 
born site for the conjugational transfer into the Agrobacte- 
rium tumefaciens cells (Bolivar et aL, 1977). This is fused 
10 to the 0.93 kb fragment isolated from transposes Tn7 which 
encodes bacterial spectinomycin and streptomycin resis- 
tance (Fling et aL, 1985), a determinant for the selection of 
the plasmids in E. coli and Agrobacterium. It is fused to the 
036 kb PvuI-BcU fragment from the pTiT37 plasmid which 
15 contains the nopaline-type T-DNA right border region 
(Fraley et al., 1985). Several chimeric genes engineered for 



(M35S; 


R2 Plants; Spray rate 


= 0.28 kg/ha) 








Vegetative 




% resistant 


Score** 




\fectar/Planl No. 


EPSPS* 


day 7 4 


«y 14 


Control Wester 


0 


4 


2 


pMON899/715 


96 


5 


6 


pMON899/744 


95 


8 


8 


pMON899/7S4 


86 


6 


4 


pMON899/818 


81 


7 


8 


pMON899/885 


57 


7 


6 



*% resistant epsps activity in the presence of 05 mM giypbosatc plant expression can be introduced between the H right and 

^egetativescoreoflO indicateam damage, a scene of 0 is given to a dead ^ of me vector . m addition to the elements 

described above, this vector also includes the 35S promoter/ 

The data obtained for the Class II EPSPS transformants ^ NPTII/NOS 3' cassette to enable selection of transformed 

may be compared to gryphosate-tolerant Class I EPSP plant tissues on kanamycin (Klee and Rogers, 1989; Fraley 

transformants in which the same promoter is used to express et aL, 1983; and Odell, et aL, 1985) within the borders. An 

the EPSPS genes and in which the level of glyphosate- "empty*' expression cassette is also present between the 

tolerant EPSPS activity was comparable for the two types of borders and consists of the enhanced E35S promoter (Kay et 

transformants. A comparison of the data of pMON17110 [in aL, 1987), the 3' region from the small subunit of RUBP 

Table DCA] and pMON17131 [Table KB] with mat for carboxylase of pea (E9) (Corazzi et aL, 1984; Morelli et aL, 

pMON899 [in Table IXC; the Class I gene in pMON899 is 19%6j, a number of restriction enzyme sites mat may be 

mat from A. thaliana {Klee et aL, 1987} in which the used for the cloning of DNA sequences for expression in 

glycine at position 101 was changed to an alanine] illustrates plants. The plant transformation system based on Agrobac- 

u^me<^nEPSPSisatle^tasgq<>dasmatof thedass ^ terium tumefaciens delivery has been reviewed (Klee and 

I EPSPS. An irnprovement in vegetative tolerance of Class Rogers, 1989; Fraley et al., 1986). The Agrobacterium 

II EPSPS is apparent when one takes into account that the mediated transfer and integration of the vector T-DNA into 
Class Hplants were sprayed at twice the rate and were tested the plant chromosome results in the expression of the 
as R ± plants. chimeric genes conferring the desired phenotype in plants. 

Exam ^j e 2 B 35 Bacterial Inoculum. The binary vectors are mobilized into 

Agrobacterium tumefaciens strain ABI by the triparental 

The construction of two plant transformation vectors and conjugation system using the helper plasmid pRK2013 

me transformation procedures used to produce glyphosate- (Ditta et aL, 1980). Hie ABI strain contains the disarmed 

tolerant canoia plants are described in mis example The pTiC5 8 plasmid pMP90RK (Koncz and Schell, 1986) in the 

vectors, pMON17209 and pMON17237, were used to gen- ^ chloramphenicol resistant derivative of the Agrobacterium 

erate transgenic glyphosate-tolerant canoia lines. The vec- tumefaciens strain A208. 

tors each contain the gene encoding the 5-enol-pyruvyl- Transformation procedure. Agrobacterium inocula were 

shikimate-3-phosphate synthase (EPSPS) from grown overnight at 28° C in 2 ml of LBSCK (LBSCK is 

Agrobacterium sp. strain CP4. The vectors also contain made as follows: LB liquid medium [1 liter volume]=10 g 

either the gox gene encoding the glyphosate oxidoreductase 45 NaQ;5gYeastExtract;10gtrypto 

enzyme (GOX) from Achromobacter sp. strain LBAA for 22 minutes. After autoclaving, add spectinomycin (50 

(Barry et al., 1992) or the gene encoding a variant of GOX mg/ml stock)— 2 ml, kanamycin (50 mg/ml stock) — 1 ml, 

(GOX v.247) which displays improved catalytic properties. and cmorarrnphenicol (25 mg/ml stock) — 1 mL). One day 

These enzymes convert glyphosate to aminomethylphospho- prior to inoculation, the Agr obacterium was subcultured by 

nic acid and glyoxylate and protect the plant from damage 50 moculating 200 ul into 2 ml of fresh LBSCK and grown 

by the metabolic inactivation of glyphosate. The combined overnight For inoculation of plant material, the culture was 

result of providing an alternative, resistant EPSPS enzyme diluted with MSO liquid medium to an A^ range of 

and the metabolism of glyphosate produces transgenic plants 0.2-0.4. 

with enhanced tolerance to glyphosate Seedlings of Brassica napus cv. Westar were grown in 

Molecular biology techniques. In generaL standard 55 Metro Mix 350 (Huminert Seed Co., St Louis, Mo.) in a 

molecular biology and microbial genetics approaches were growth chamber with a day/night temperature of 15710° C, 

employed (Maniaris et aL, 1982). Site-directed mutageneses relative humidity of 50%, 16h/8h photoperiod, and at a light 

were carried out as described by Kunkel et aL (1987). intensity of 500 umol m 2 sec' 1 . The plants were watered 

Plant-preferred genes were synthesized and the sequence daily (via sub-irrigation) and fertilized every other day with 
confirmed. 60 Peter's 15:30:15 (Fogelsville, Pa.). 

Plant transformation vectors. The following describes the In general, all media recipes and the transformation 

general features of the plant transformation vectors that were protocol follow those in Fry et aL (1987). Five to six 

modified to form vectors pMON17209 and pMON17237. week-old Westar plants were harvested when the plants had 

The Agrobacterium mediated plant transformation vectors bolted (but prior to flowering), the leaves and buds were 
contain the following well-characterized DNA segments 65 removed, and the 4-5 inches of stem below the flower buds 

which are required for replication and function of the were used as the explant tissue source. Following steriliza- 

plasmids (Rogers and Klee, 1987; Klee and Rogers, 1989). tion with 70% ethanol for 1 nrin and 38% Qorox for 20 min, 
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the stems were rinsed three times with sterile water and cot The GOX gene that encodes the glyphosate metabolizing 

into 5 mm-long segments (the orientation of the basal end of enzyme glyphosate oxidoreductase (GOX) was cloned origi- 

the stem segments was noted). The plant material was nally from Achromobacter sp. strain LBAA (Haflas et aL, 

incubated for 5 minutes with *e diluted ^bacterium 1988; Barry et aL, 1992). The gox gene from strain LBAA 

culture at a rate of 5 ml of culture per 5 stems. The * \ , . 

suspension of tecteria w^ removed by^o^d ml 5 wasaJsoresynmesazedma 

explants were placed basal side down— fox an optimal shoot md m wmch 103117 of me rcstacUon sites were removed 

regeneration response— onto co-culture plates (Vio MSO ^PP^ No. WO 92/00377). The GOX protein is 

solid medium with a 1.5 ml TXD (tobacco xanthi diploid) targeted to the plastids by a fusion between the C-tenninus 

liquid medium overlay and covered with a sterile 8.5 cm of a CTP and the N-tenninus of GOX. A CTP, derived from 

filter paper). Fifty-to-sixty stem explants were placed onto 10 the SSUlA gene from Arabidopsis thaliana (Timko et aL, 

each co<uhure plate. 1988) was used. This CTP (CTP1) was constructed by a 

After a 2 day co-culture period, stem explants were combination of site-directed mutageneses. The CTP1 is 

moved onto MS medium containing 750 mgA carbemcfllin, made up of the SSU1AOT (amino acids 1-55), the first 23 

50 mgA cefotoime, and l mg/1 BAP ^nzylammopurine) amino acids of the mature SSUlA protein (56-78), a serine 

for 3 days. The stem explants were then placed for two 13 ^ AnA /et ^: M „ « ^JLL^. <u„+ .'L t 

periods of three weeks each, again basal side do wn and with (amino aad 79), a new segment that repeats amino 

fexplants per TSl mM^hosS ^ 50 to 56 from the CTP and the first two from the 

selection medium (also containing carbemcfllin, cefotaxime, mature protein (amino acids 80-87), and an alanine and 

and BAP (The glyphosate stock [0.5M] is prepared as metmonine residue (amino acid 88 and 89). An Ncol restric- 

described in the following: 8.45 g glyphosate [analytical 20 rion site is located at me 3' end (spans me Met89 codon) to 

grade] is dissolved in 50 ml deionized water, adding KOH facilitate the construction of precise fusions to the 5* of 

pellets to dissolve the glyphosate, and the volume is brought GOX At a later stage, a BgDT site was introduced upstream 

to 100 ml following adjusting the pH to 5.7. The solution is of the N-tominus of the SSUlA sequences to facilitate the 

filter-sterilized and stored at 4° C). After 6 weeks on mis introduction of the fusions into plant transformation vectors, 

glyphosate selection medium, green, normally developing 25 A fusion was assembled between CTP1 and the synthetic 

shoots were excised from the stem explants and were placed GOX gene, 
onto fresh MS ™rfiiiin containing 750 mgA carbemcfllin, 50 

mgfl cefotaxime, and 1 mg/1 BAP, for further shoot devel- ^ CP4 EPSPS and GOX genes were combined to form 

opment When the shoots were 2-3 inches tall, a fresh cut at PMON17209 as described in me following. The CTP2-CP4 

the end of the stem was made, the cut end was cupped in 30 was assembled and inserted between the 

Root-tone, and the shoot was placed in Metro Mix 350 soil ^^f^^PT^^^^^ 1989;Ridlins 

and allowed to harden-off for 2-3 weeks. * 1987 > and me E9 3 Te ®° n (Caruzzi et aL, 1984; 

Construction of Canola transformation vector ^o^**'^^ 

P MON17209. The EPSPS gene was isolated originally from ^^^^^ l^)tof^pMON1719^this 

Agrobacterium sp. strain CP4 and expresses a M^ry toler- 35 ^nmleted dement may me^be 

anTenzyme. The^riginal gene containWencS could ^nt to omer vectors. The OT^X fusion was also 

be inimical to MghW^sion of the gene in some plants. assembled m a P UC vecto with &e^35Sj^omoter. This 

TTiese sequences mclnde potential pofyadenylation sires that ^*™f *L! ^^^ ft T*?T > 

are often A+T rich, a hi^G+A than that frequently * e ^ant ^pni^on vector pMON10098 and jomed to 

found in dicotyledonous plant genes (63% versus~50%), 40 " i° S^Tt^f ^tTL^ 

concentrated stretches of G and C residues, and codons that ^"V^t^EZ 

may not used frequently in dicotyledonous plant genes. The 35S/CTP2-CP4 EPSPS^9 3 e^nt from pMON17190 

high G+C % in the CP4 EPSPS gene couU also rL* in the ™f»* to ^PMONni9^The ^namycm plant 

formation of strong r'^^l^^^K^ f^ft^ 

coding sequence was expressed in R coli from a PRecA- Construction of Canola transformation vector 
genelOL vector (Olins et aL, 1988) and the EPSPS activity pMON17237. The GOX enzyme has an apparent Km for 
was compared with that from the native CP4 EPSPS gene. glyphosate [appK m (gryphosate)] of -25 mM. In ah effort to 
The appK^ for PEP for the native and synthetic genes was so improve the effectiveness of the glyphosate metabolic rate in 
11.8 uM and 12.7 uM, respectively, indicating mat the planta, a variant of GOX has been identified in which the 
enzyme expressed from the synthetic gene was unaltered. appK m (glyphosate) has been reduced approximately 
The N- terminus of the coding sequence was then 10-fold; this variant is referred to- as- GOX v.247 and the 
miitagenized to place an SphI site (GCATGQ at the ATG to sequence differences between it and the original plant- 
permit the construction of the CTP2-CP4 synthetic fusion 55 preferred GOX are illustrated in PCT Appln. No. WO 
for chloroplast import This change had no apparent effect 92/00377. The GOX v.247 coding sequence was combined 
on the in vivo activity of CP4 EPSPS in E. coli as judged by with CTP1 and assembled with the FMV35S promoter and 
complementation of the aroA mutant A CTP-CP4 EPSPS the E9 3 ! by cloning into the pMON17227 plant transfer- 
fusion was constructed between the Arabidopsis thaliana mation vector to form pM ON 17241. In this vector, 
EPSPS CIP (Klee et al., 1987) and the CP4 EPSPS coding 60 effectively, the CP4 EPSPS was replaced by GOX v.247. 
sequences. The Arabidopsis CTP was engineered by site- The pMON17227 vector had been constructed by replacing 
directed mutagenesis to place a SphI restriction site at the the CTP1-GOX sequences in pMON17193 with those for 
CTP processing site. This mutagenesis replaced the Glu-Lys the CTP2-CP4 EPSPS, to form pMON17199 and followed 
at this location with Cys-Met The CTP2-CP4 EPSPS fusion by deleting the kanamydn cassette (as described above for 
was tested for import into chloxoplasts is olated from Lactuca 65 pMON17209). The pMON17237 vector (FIG. 25) was then 
sativa using the methods described previously (della-Qoppa completed by cloning the FMV35S/CTP2-CP4 EPSPS/E9 3' 
et aL, 1986; 1987). element as a Notl-NotI fragment into pMON17241. 
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Example 3 

Soybean plants were transformed with the pMON13640 
(FIG. 15) vector and a number of plant lines of the trans- 
formed soybean were obtained which exhibit glyphosate 
tolerance. 

Soybean plants are transformed with pMON13640 by the 
method of microprojectile injection using particle gun tech- 
nology as described in Christou et aL (1988). The seed 
harvested from plants is R x seed which gives rise to R t 
plants. To evaluate the glyphosate tolerance of an R„ plant, 
its progeny are evaluated. Because an Replant is assumed to 
be hemizygous at each insert location, setting results in 
maximum genotypic segregation in the R v Because each 
insert acts as a dominant allele, in the absence of linkage and 
assuming only one hemizygous insert is required for toler- 
ance expression, one insert would segregate 3:1, two inserts, 
15:1, three inserts 63:1, etc. Therefore, relatively few R t 
plants need be grown to find at least one resistant phenotype. 

Seed from an R 0 soybean plant is harvested, and dried 
before planting in a glyphosate spray test Seeds are planted 
into 4 inch (~5 cm) square pots containing Metro 350. 
Twenty seedlings from each R t> plant is considered adequate 
for testing. Hants are maintained and grown in a greenhouse 
environment A 123-14 hour photoperiod and temperatures 
of 30° C day and 24° C night is regulated. Water soluble 
Peters Fete Lite fertilizer is applied as needed 

A spray "batch" consists of several sets of R x progenies all 
sprayed on the same date. Some batches may also include 
evaluations of other than R 4 plants. Each batch also includes 
sprayed and unsprayed non-transgemc genotypes represent- 
ing the genotypes in the particular batch which were puta- 
tively transformed Also included in a batch is one or more 
non-segregating transformed genotypes previously identi- 
fied as having some resistance. 

One to two plants from each individual R„ progeny are not 
sprayed and serve as controls to compare and measure the 
glyphosate tolerance, as well as to assess any variability not 
induced by the glyphosate. When die other plants reach the 
first trifoliate leaf stage, usually 2-3 weeks after planting, 
glyphosate is applied at a rate equivalent of 128 ozVacre 
(8.895 kg/ha) of Roundup®. A laboratory track sprayer has 
been calibrated to deliver a rate equivalent to those condi- 
tions. 

A vegetative score of 0 to 10 is used The score is relative 
to the unsprayed progenies from the same R 0 plant A 0 is 
death, while a 10 represents no visible difference from the 
unsprayed plant A higher number between 0 and 10 repre- 
sents progressively less damage as compared to the 
unsprayed plant Plants are scored at 7, 14, and 28 days after 
treatment (DAT). The data from the analysis of one set of 
transformed and control soybean plants are described on 
Table X and show that the CP4 EPSPS gene imparts 
glyphosate tolerance in soybean also. 

TABLE X 



Glyphosate tolerance in Class H EPSPS soybean 
ttansformanls 

(P-B35S, P-FMV35S; RO plants; Spray rate ■ 128 ctzVacrc) 



Vegetative score 



Vector/Plant No. 


day 7 


day 14 


day 28 


13640/40-11 


5 


6 


7 


13640/40-3 


9 


10 


10 


13640/40-7 


4 


7 


7 


control A5403 2 


1 


0 




control A5403 1 


1 


0 





The CP4 EPSPS gene may be used to select transformed 
plant material directly on media containing glyphosate. The 

5 ability to select and to identify transformed plant material 
depends, in most cases, on the use of a dominant selectable 
marker gene to enable the preferential and continued growth 
of the transformed tissues in the presence of a normally 
inhibitory substance. Antibiotic resistance and herbicide 
tolerance genes have been used almost exclusively as such 
dominant selectable marker genes in the presence of the 
corresponding antibiotic or herbicide. The nptH/kanamycin 
selection scheme is probably the most frequently used. It has 
been demonstrated that GP4 EPSPS is also a useful and 

l5 perhaps superior selectable marker/selection scheme for 
producing and identifying transformed plants. 

A plant transformation vector that may be used in this 
scheme is pMON17227 (FIG. 16). This plasmid resembles 
many of the other plasmids described infra and is essentially 

20 composed of the previously described bacterial replicon 
system that enables this plasmid to replicate in £1 coll and 
to be introduced into and to replicate in Agrobacterhim, the 
bacterial selectable marker gene (Spc/Str), and located 
between the T-DNA right border and left border is the 

25 CTP2-CP4 synthetic gene in the FMV35S promoter-E9 3* 
cassette. This plasmid also has single sites for a number of 
restriction enzymes, located within the borders and outside 
of the expression cassette. This makes it possible to easily 
add other genes and genetic elements to the vector for 

3Q introduction into plants. 

The protocol for direct selection of transformed plants on 
glyphosate is outlined for tobacco. Explants are prepared for 
pre-culrarc as in the standard procedure as described in 
Example 1: surface sterilization of leaves from 1 month old 

35 tobacco plants (15 minutes in 10% clorox+surfactant; 3x 
dH 2 0 washes); explants are cut in 0.5x0.5 cm squares, 
removing leaf edges, mid-rib, tip, and petiole end for uni- 
form tissue type; explants are placed in single layer, upside 
down, on MS 104 plates+2 ml 4C005K media to moisten 

40 surface; pre-culture 1-2 days. Explants are inoculated using 
overnight culture of Agrobacterium containing the plant 
transformation plasmid that is adjusted to a titer of 1.2x10* 
bacteria/ml with 4C005K media. Explants are placed into 
a centrifuge tube, the Agrobacterium suspension is added 

45 and the mixture of bacteria and explants is "Vortexed" on 
maximum setting for 25 seconds to ensure even penetration 
of bacteria. The bacteria are poured off and the explants are 
blotted between layers of dry sterile filter paper to remove 
excess bacteria. The blotted explants are placed upside down 

so on MS 104 plates+2 ml 4C005K media+filter disc. 
Co-culture is 2-3 days. The explants are transferred to 
MS104+Carberdcillin 1000 mgfl+cefotaxime 100 mgfl for 3 
days (delayed phase). The explants are then transferred to 
MS104+glyphosate 0.05 mM-K^itemdllin 1000 mgfl+ 

55 cefotaxime 100 mgfl for selection phase. At 4-6 weeks 
shoots are cut from callus and placed on MSOfCarbenicilHn 
500 mg/1 rooting media. Roots form in 3-5 days, at which 
time leaf pieces can be taken from rooted plates to confirm 
glyphosate tolerance and that the material is transformed, 

60 The presence of the GP4 EPSPS protein in these trans- 
formed tissues has been confirmed try immunohlot analysis 
of leaf discs. The data from one experiment with 
pMON17227 is presented in the following: 139 shoots 
formed on glyphosate from 400 explants inoculated with 

65 AgrobacterinmABI/pMON17227; 97 of these were positive 
on recall using on glyphosate. These data indicate a trans- 
formation rate of 24 per 100 explants, which makes this a 





Cflyphosafie res 


tstance in BMS Com Callus 
ig pMON 19653 


Vector 




resistant lutes 


# exoss-resistaxxt 
to Glyphosate 


19653 


253 


120 


817120 = 67.5% 


19653 


254 


80 


37/80=46% 


EC9 coulroi 


253/254 


8 


cra=o% 
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highly efficient and time saving transformation procedure was dried on filter paper (Whatman#l) under vacuum, 

for plants. Similar transformation frequencies have been reweighed, and extraction buffer (500 ul/g dry weight; 100 

obtained with pMON17131 and direct selection of transfer- mMTris, 1 mM EDTA, 10% glycerol) was added. The tissue 

mants on glyphosate with the CP4 EPSPS genes has also was homogenized with a Wheaton overhead stirrer for 30 

been shown in other plant species, including, Arabidopsis, 5 seconds at 2.8 power setting. After centrifugarion (3 

soybean, corn, wheat, potato, tomato, cotton, lettuce, and minutes, Eppendorf microfuge), the supernatant was 

sugarbeet removed and the protein was quantitated (BioRad Protein 

The pMON17227 plasmid contains single restriction Assay). Samples (50 ug/well) were loaded on an SDS PAGE 

enzyme recognition cleavage sites (NotI, 3 and BstXI ) ™*<* 4 B ^?f inl < 10 

between the CP4 glyphosate selection region and the left 10 ele^ophoresed, and Weired to ntoocellulose ^similarly 

border of the vector for the cloning of adrenal genes and t0 JL£^ ^1 (P ^^L2^ 
to facilitate the introduction of these genes into plants. mtrocdhilose blot was probed with goat anti-CP4 EPSPS 

ldwmttlc B p IgG, and developed with 1-125 Protein G. The radioactive 

Example 5 A blot was visualized by autoradiography. Results were quan- 

15 titated by densitometry on an LKB UhxaScan XL laser 
The CP4 EPSPS gene has also been introduced into Black densitomer and are tabulated below in Table X. 
Mexican Sweet (BMS) com cells with expression of the 

protein and glyphosate resistance detected in callus. TABLE XII 

The backbone for this plasmid was a derivative of the hi gh 
copy plasmid pUCL19 (Vlera and Messing, 1987). Hie 13 20 
Kb Fspl-Dral pUC119 fragment containing the origin of 
replication was fused to the 13 Kb Smal-HindlH filled 
fragment from pKC7 (Rao and Rogers, 1979) which con- 
tains the neomycin phosphotransferase type H gene to confer 
bacterial kanamycin resistance. This plasmid was used to 25 
construct a monocot expression cassette vector containing 
the 0.6 Id) cauliflower mosaic virus (CaMV) 35S RNA 
promoter jrih a ^Ucation of fee -90 to -300 region (Kay Improve ments in the expression of Class n EPSPS could 
et aL, 1987), an 0 * H> fragmen : «>ntaimng an mtron from ^ adlieved ^ expressing the gene using stronger plant 
aina^gene m the 5 un^sla^ leader regton, fo^wed 30 promoters, using better 3* potyaden&onl^nal sequences, 
by a pdylinte and fce 3' term^on se^ences fromAe ^ sequences \round me iniuS codon far 

^SSS£S^^^£ Ldmgand translation initiation, or by combina- 

fra^nent interning the300 bp f chloroplast transit pepude tioQ oftheseorothEr expression or regulatory sequences or 
from the Arabidopsis EPSP synthase fused in frame to the factors. 
1.4 Kb coding sequence for the bacterial CP4 EPSP synthase 35 

was inserted into the monocot expression cassette in the Example 5B 

polylinker between the intron and the NOS tennination The plant- expressible genes encoding the CP4 EPSPS 
sequence to form the plasmid pMON19653 (FIG. 17). and a glyphosate oxidoreductasease enzyme (PCT Pub. No 

pMON19653 DNA was introduced into Black Mexican WO92/00377) were introduced into ernbryogenic corn cal- 
Sweet (BMS) cells by co-bombardment with EC9, a plasmid 40 lus through particle bombardment Plasmid DNA was pre- 
containing a sulfonylurea-resistant form of the maize aceto- pared using standard procedures (Ausubel et aL, 1987), 
lactate synthase gene. 2J> mg of each plasmid was coated cesium-chloride purified, and re-suspended at 1 mg/ml in IB 
onto tungsten particles and introduced into log-phase BMS buffer. DNA was precipitated onto M10 tungsten or 1.0 ug 
cells using a PDS-1000 particle gun essentially as described gold particles (BioRad) using a calcium chloride/sperrnidine 
(Klein et aL, 1989). Transformants are selected on MS 45 precipitation protocol, essentially as described by Klein et 
medium containing 20 ppb chlorsulfuron. After initial selec- aL (1987). The PDS1000® gunpowder gun (BioRad) was 
tion on cMorsulfuron, the calli can be assayed directly by used. Callus tissue was obtained by isolating 1-2 mm long 
Western blot Glyphosate tolerance can be assessed by immature embryos from the "Hi-IT genotype (Armstrong et 
transferring the calli to medium containing 5mM glypho- aL, 1991), or Hi-II X B73 crosses, onto a modified N6 
sate. As shown in Table XI, CP4 EPSPS confers glyphosate 50 medium (Armstrong and Green, 1985; Songstad et aL, 
tolerance to com callus. 1991). Ernbryogenic callus ("type-IF; Armstrong and 

Green, 1985) initiated from these embryos was maintained 
by subcultnring at two week intervals, and was bombarded 
when less man two months old. Each plate of callus tissue 
55 was bombarded from 1 to 3 times with either tungsten or 
gold particles coated with the plasmid DNA(s) of interest 
Callus was transferred to a modified N6 medium containing 
an appropriate selective agent (either glyphosate, or one or 
more of the antibiotics kanamycin, G418, or paromomycin) 
60 1-8 days following bombardment, and then re-transferred to 
fresh selection media at 2-3 week intervals. Glyphosate- 
resistant calli first appeared approximately 6-12 weeks 
po st-bombardment These resistant calli were propagated on 
selection medium, and samples were taken for assays gene 
65 expression. Plant regeneration from resistant calli was 
To measure CP4 EPSPS expression in com callus, the accomplished essentially as described by Petersen et aL 
following procedure was used: BMS callus (3 g wet weight) (1992). 



TABLE XI 




I of CP4 in BMS Com Callus - pMON 19653 




CF4 expression 


Line 


(% extract protein) 


284 


0006% 


287 


0.036 


290 


0.061 


295 


0.073 


299 


0.113 


309 


0.042 


313 


0.003 



Ibbacco Glyphosate Spray Tfcst 


(pMON17206: ES5S 


- CIPZ-LBAA EPSPS: 04 Ibsfec) 


Line 


7 Day Rating 


33358 


9 


34586 


9 


33328 


9 


34606 


9 


33377 


9 


34611 


10 


34607 


10 


34601 


9 


34589 


9 


Samson (Control) 


4 
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In some cases, both gene(s) were covalcntly linked tobacco transformed with pMON17206 (infra) are presented 

together on the same plasmid DNA molecule. In other in Table XTTT . 
instances, the genes were present on separate plasmids, but 

were introduced into the same plant through a process TABLE " XTTT 
termed "co^ransf orrrtation" . The 1 mgAnl plasmid prepara- 5 
tions of interest were mixed together in an equal ratio, by 
volume, and then precipitated onto the tungsten or gold 
particles. At a high frequency, as described in the literature 
(e.g., Schocher et aL, 1986), the different plasmid molecules 
integrate into the genome of the same plant celL Generally 
the integration is into the same chromosomal location in the 
plant cell, presumably due to recombination of the plasmids 
prior to integration. Less frequently, the different plasmids 
integrate into separate chromosomal locations. In either 
case, there is integration of both DNA molecules into the 
same plant cell, and any plants produced from that celL 15 

Transgenic corn plants were produced as described above 
which contained a plant-expressible CP4 gene and a plant- 
expressible gene encoding a glyphosate oxidareductase 

enzyme. Prom the foregoing, it win be recognized that this inven- 

The plant-expressible CP4 gene comprised a structural 20 tion is one well adapted to attain all the ends and objects 

DNA sequence encoding a CTP2/CP4 EPSPS fusion pro- hereinabove set forth together with advantages which are 

tein. The CTP2/CP4 EPSPS is a gene fusion composed of obvious and which are inherent to the invention. It will be 

the N-terminal 0.23 Kb ohloroplast transit peptide sequence further understood that certain features and subcombinations 

from the Arabidopsis thaluma EPSPS gene (Klee et aL are of utility and may be employed without reference to 

1987, referred to herein as CTP2), and the C-tenninal 136 25 other features and subcombinations. This is contemplated by 

Kb 5-enolpyruvylshildmate-3-phosphate synthase gene and is within the scope of the claims. Since many possible 

(CP4) from an Agrobacterium species. Plant expression of embodiments may be made of the invention without depart- 

the gene fusion produces a pre-protein which is rapidly ing from the scope thereof, it is to be understood that all 

imported into chloroplasts where the CTP is cleaved and matter herein set forth or shown in the accompanying 

degraded (deUa-Qoppa et aL, 1986) releasing the mature 30 Swings is to be interpreted as illustrative and not in a 

CP4 protein. limiting sense. 
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SEQUENCE LISTING 

( 1 ) GENERAL INFORMATION: 

( i i i ) NUMBER OF SEQUENCES: 69 

( 2 ) INFORMATION FOR SEQ ID NO;l: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 597 base pan 
( B ) TYPE: ODckic arid 
( C )STRANDEDNESS: doable 
( D ) TOPOLOGY: Bscs 

( t i ) MOLECULE TYPE: DNA (genomic) 

( X 1 ) SEQUENCE DESCRIPTION: SBQ ID NOtl: 

TCATCAAAAT ATTTAOCAGC ATTCCAGATT GOGTTCAATC AACAAGOTAC OAOCCATATC 60 

ACTTTATTCA AATTOOTATC GCCAAAACCA AO A AGO AACT CCCATCCICA AAOGTTTOTA 120 

AGGAAOAATT CTC AGTCCAA AGCCTCAACA AOGTCAGGOT ACAGAGTCTC CAAACCATTA 180 

GCCAAAAGCT ACAGGAGATC A A TG A AG A AT CTTCAATCAA AGTAAACTAC TOTTCCAGCA 240 

CATOCATCAT GGTCAGTAAG TTTCAGAAAA AOACATCCAC COAAOACTTA AAGTTAGTGG 3 00 

GCATCTTTOA AAOTAATCTT GTCAACATCO AGCAGCTGGC TTGTGGGOAC CAOACAAAAA 3 60 

AOG AATGGTG C AG AATTGTT AOGCGCACCT ACCAAAAOCA TCTTTOCCTT TATTGCAAAG 420 

ATAAAGCAGA TTCCTCT AGT ACAAGTGGGG AA C AAA AT AA COTOOAAAAO AGCTGTCCTO 48 0 

ACAOCCCACT C ACT AATGCG TATGACGAAC GCAGTGACGA CCACAAAAGA ATTCCCTCTA 540 

TAT A AG A AGO CATTCATTCC CATTTGAAGG ATCATCAGAT ACT A AC C AAT ATTTCTC 597 



( 2 ) INFORMATION FOR SBQ ID NOi: 

( i ) SEQUENCE CHARACTERISTICS: 

( A ) LENGTH: 1932 base pans 
( D ) TYPE: oneiric arid 
( C ) STRANDEDNESS: doable 
( D ) TOPOLOGY: finear 

< i i ) MOLECULE TYPE: DNA (genome) 



( i x ) FEATURE: 

( A ) NAME/KEY: CDS 
( B ) LOCATION: 62-1426 
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( z i ) SEQUENCE DESCRIPTION: SBQ ID N02: 

AAOCCCOCOT TCTCTCCOOC OCTCCOCCCG OAOAOCCOTO OAT AO AT TA A OOAAOACGCC 60 

C ATO TCO CAC GOT OCA AOC AOC COO CCC OCA ACC OCC COC AAA TCC 106 

Met Ser Bit Oly Ala Sor Ser Ar g Pro Ala Tbr Ala Arg Lyn Ser 

1 5 10 15 

TCT GOC CTT TCC OOA ACC OTC COC ATT CCC OOC OAC AAO TCO ATC TCC 154 

Ser Oly Lev Ser Oly Tbr Val Arg lie Pro Oly Asp Lys Sor lie Ser 

2 0 2 5 3 0 

CAC COO TCC TTC ATO TTC OOC OOT CTC GCO AGC GOT OAA ACO COC ATC 202 

His Arg Ser Phe Met Pbe Gly Gly Leu Ala Ser Oly Olo Thr Arg lie 

3 5 40 4 5 

ACC GOC CTT CTO OAA OOC OAO OAC OTC ATC A AT ACO OOC AAO OCC AT G 250 

Thr Gly Leu Leo Glo Oly Gin Asp Val lie Asa Thr Gly Lys Ala Met 

5 0 5 5 60 

CAG OCC ATO OOC OCC AGO ATC COT AAO OAA OOC OAC ACC TGO ATC ATC 2 98 

Ola Ala Met Gly Ala Arg lie Arg Lys Glo Gly Asp Tbr Trp lie lie 

65 70 75 

OAT OOC OTC OGC AAT GGC OGC CTC CTO OCG CCT OAO OCO CCO CTC OAT 3 46 

Asp Oly Val Oly Asa Oly Oly Leo Leo Ala Pro Olo Ala Pro Lea Asp 

80 85 90 95 

TTC OGC AAT OCC OCC ACO OOC TOC CGC CTO ACC ATO OOC CTC OTC OGO 394 

Pbe Oly Aid Ala Ala Thr Oly Cys Arg Leo Thr Met Oly Lea Val Oly 

10 0 10 5 110 

OTC TAC GAT TTC OAC AGC ACC TTC ATC GGC G AC OCC TCO CTC ACA A AG 442 

Val Tyr Asp Pbe Asp Ser Thr Phe lie Gly Asp Ala Ser Loo Thr Lys 

115 12 0 12 5 

CGC CCO ATO OOC COC OTO TTO AAC CCG CTO CGC OAA ATO OGC GTO CAG 490 

Arg Pro Met Oly Arg Val Leo Asa Pro Lea Arg Ola Met Gly Val Gla 

13 0 13 5 140 

GTO AAA TCG OAA GAC GOT OAC CGT CTT CCC OTT ACC TTO COC OGO CCO 53 8 

Val Lys Ser Olo Asp Oly Asp Arg Leo Pro Val Thr Leu Arg Oly Pro 

145 15 0 15 5 

A AG ACQ CCG ACO CCO ATC ACC TAC CGC OTO CCO ATO OCC TCC GCA CAG 586 

Lys Thr Pro Thr Pro Ilo Thr Tyr Arg Val Pro Met Ala Ser Ala Gla 

160 165 170 175 

OTO AAO TCC OCC GTO CTO CTC GCC GOC CTC AAC ACO CCC GOC ATC ACO 634 

Val Lys Ser Ala Val Leo Leu Ala Oly Leo Asn Tbr Pro Oly lie Thr 

180 185 190 

ACO OTC ATC GAG CCG ATC ATO ACO COC OAT CAT ACO OAA AAO ATO CTG 6 82 

Thr Val lie Olo Pro lie Met Thr Arg Asp Bis Thr Glo Lys Met Leo 

195 200 205 

CAG OGC TTT OOC OCC AAC CTT ACC OTC GAG ACQ OAT OCG GAC GGC GTG 730 

Ola Gly Phe Gly Ala Asa Leo Thr Val Glo Thr Asp Ala Asp Oly Val 

2 10 2 15 2 2 0 

CGC ACC ATC COC CTG OAA GGC CGC OOC AAO CTC ACC OOC CAA OTC ATC 778 

Arg Thr lie Arg Leo Ola Gly Arg Oly Lys Leo Thr Oly Ola Val lie 

225 230 235 

OAC OTO CCO GGC GAC CCO TCC TCO ACO GCC TTC CCO CTG GTT GCO OCC 826 

Asp Val Pro Oly A§p Pro Ser Ser Thr Ala Phe.Pro Leo Val Ala Ala 

240 245 250 25 5 

CTO CTT GTT CCG GOC TCC GAC OTC ACC ATC CTC AAC GTG CTO ATO AAC 874 

Leo Leo Val Pro Oly Ser Asp Val Thr lie Lea Asa Val Loo Met Asa 

260 265 270 

CCC ACC COC ACC OGC CTC ATC CTG ACO CTG CAG OAA ATO OOC OCC GAC 922 

Pro Thr Arg Thr Gly Leo lie Leo Thr Leo Gla Olo Met Gly Ala Asp 

275 280 285 

ATC G A A OTC ATC AAC CCO COC CTT GCC GGC GOC OAA GAC GTG GCO OAC 970 

lie Glo Val lie Asa Pro Arg Leo Ala Oly Oly Glo Asp Val Ala Asp 

290 295 300 



57 
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CTO COC GTT COC TCC TCC ACO CTQ A AG OOC GTC ACQ OTG CCO GAA G AC 10 18 

Leo Aig Val Arg Ser Ser Thr Leo Lyi Gly Val Thr Val Pro Glo Asp 

3 05 3 10 3 15 

COC OCG CCT TCG ATO ATC OAC GAA TAT CCO ATT CTC OCT GTC GCC OCC 1066 

Arg Ala Pro Ser Met lie Asp Glo Tyr Pro lie Leo Ala Val Ala Ala 

320 325 330 335 

GCC TTC OCG GAA GGG OCG ACC OTG ATO AAC GOT CTO GAA GAA CTC CGC 1114 

Ala Phe Ala Glo Gly Ala Thr Val Met Aso Gly Leo Glo Glo Leo Arg 

340 345 350 

OTC AAO GAA AGC OAC COC CTC TCG OCC OTC OCC AAT OGC CTC A AG CTC 1162 

Val Lys Glo Set Aip Arg Leo Set Ala Val Ala Asa Gly Leo Lys Leo 

355 360 365 

AAT GGC OTG OAT TOC OAT GAG OOC O AG ACO TCG CTC GTC OTO COC OOC 1210 

Aso Gly Val Asp Cys Asp Olo Gly Glo Thr Ser Lea Val Val Arg Gly 

370 375 380 

COC CCT OAC GGC AAG GGG CTC OOC AAC OCC TCG OGC OCC GCC OTC GCC 1258 

Arg Pro Asp Gly Lyi Gly Leo Gly Asa Ala Ser Gly Ala Ala Val Ala 

3 85 390 395 

ACC CAT CTC OAT CAC CGC ATC GCC ATG AGC TTC CTC GTC ATG OGC CTC 1306 

Thr His Leo Asp His Arg lie Ala Met Ser Phe Leu Val Met Gly Leo 

400 40 5 410 415 

GTO TCG GAA AAC CCT GTC ACO OTG OAC OAT OCC ACG ATG ATC GCC ACO 1354 

Val Ser Gin Asa Pro Val Thr Val Asp Asp Ala Thr Met lie Ala Thr 

420 425 430 

AGC TTC CCG GAO TTC ATO G AC CTG ATO OCC GGG CTO OGC OCG AAO ATC 1402 

Sor Phe Pro Olo Phe Met Asp Lea Met Ala Oly Leo Gly Ala Lys lie 

435 440 445 

GAA CTC TCC GAT ACG AAO OCT OCC TOAT OAC C T T CACAATCGCC ATCOATGGTC 1456 
Glo Leu Ser Asp Thr Lys Ala Ala 
45 0 45 5 

CCGCTOCOOC CGGC AAGOGG ACOCTCTCGC OCCGTATCOC GGAGGTC T A T GOCTTT CATC 15 16 

ATCTCGATAC GGGCCTOACC TATCOCOCCA CGGCCAAAOC OCTOCTCOAT CGCGOCCTGT 1576 

CGCTTOATOA CGAGOCOGTT GCGGC CGA TO TCGCCCGCAA TCTCOATCTT GCCOGOCTCO 1636 

ACCOOTCOGT GCTOTCGGCC CATGCCATCG GCGAGGCGGC T TCG AAG AT C OCOOTCATOC 1696 

CCTCOOTOCO GCGOGCGCTG GTCGAOOCOC AGCOCAGCTT TGCGOCOCGT GAOCCOGOCA 1756 

CGGTGCTOGA T GO AC OCOAT AT CGGC A COG TGGTCTGCCC GGATOCOCCO GTOAAOCTCT 18 16 

ATGTCACCGC GTC AC COG A A OTGCGCOCOA AACOCCGCTA TGACGAAATC CTCGGCAATG 1876 

OCOOGTTGGC C OAT T AC GOG ACGATCCTCG AGGATATCCG CCOCCOCOAC OAOCOOGACA 1936 

TGOGTCGOGC OGACAOTCCT TTGAAGCCCO CCGACGATOC GCACTT 1982 



( 2 ) INFORMATION FOR SEQIDNOS: 
( i )S 



( A ) LENGTH: 455 am 
( B ) TYPE: anaao add 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: protein 

( x i ) SEQUENCE DESCRIPTION: SEQ ID N03: 

Met Ser His Gly Ala Ser Ser Arg Pro Ala Thr Ala Arg Lys Ser Ser 
1 5 10 15 

Gly Leo Ser Gly Thr Val Arg lie Pro Oly Asp Lys Ser lie Ser His 
2 0 2 5 3 0 

Arg Ser Phe Met Phe Oly Oly Lea Ala Ser Gly Glo Thr Arg lie Thr 
3 5 40 45 
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Gly Leu Lea Gin Oly Gla Asp Val lie Am Thr Gly Lys Ala Met Gin 
5 0 5 5 6 0 

Ala Met Gly Ala Arg IIo Arg Lys Gla Gly Asp Thr Trp Ilo lie Asp 
65 70 75 80 

Gly Val Oly Asn Gly Gly Lea Lea Ala Pro Gla Ala Pro Lea Asp Pbe 

85 9 0 95 

Gly Asn Ala Ala Thr Oly Cys Arg Lou Thr Met Gly Leo Val Oly Val 
10 0 10 5 110 

Tyr Aip Phe Asp Ser Thr Phe lie Oly Asp Ala Ser Lea Thr Lys Arg 
115 12 0 125 

Pro Met Oly Arg Val Leu Asn Pro Lea Arg Ola Met Gly Val Gin Val 
130 135 140 

Lys Ser Ola Asp Oly Asp Arg Leo Pro Val Thr Lea Arg Gly Pro Lys 
145 150 155 160 

Thr Pro Thr Pro lie Thr Tyr Arg Val Pro Met Ala Ser Ala Gin Val 
165 170 175 

Lys Ser Ala Val Lea Leu Ala Oly Leo Asn Thr. Pro Oly lie Thr Thr 
18 0 18 5 19 0 

Val lie Ola Pro lie Met Thr Arg Asp His Thr Ola Lys Met Leo Ola 
195 200 205 

Oly Phe Gly Ala Asn Leu Thr Val Olu Thr Asp Ala Asp Oly Val Arg 
2 10 2 15 220 

Thr lie Arg Leo Ola Oly Arg Oly Lys Lea Thr Gly Oln Val lie Asp 
225 230 235 240 

Val Pro Oly Asp Pro Ser Ser Thr Ala Phe Pro Leo Val Ala Ala Leu 
245 250 255 

Leu Val Pro Gly Ser Asp Val Thr lie Leo Asn Val Leu Met Asn Pro 
260 265 270 

Thr Arg Thr Gly Leo lie Leo Thr Leu Gin Olo Met Gly Ala Asp lie 
275 280 285 

Olo Val lie Asn Pro Arg Leo Ala Oly Oly Olo Asp Val Ala Asp Leo 

290 295 300 

Arg Val Arg Ser Ser Thr Leo Lys Oly Val Thr Val Pro Olo Asp Arg 
305 310 315 320 

Ala Pro Ser Met lie Asp Glo Tyr Pro lie Leo Ala Val Ala Ala Ala 

325 330 335 

Phe Ala Glo Oly Ala Thr Val Met Asn Oly Leo Glo Olo Leo Arg Val 
340 3 45 350 

Lys Olo Ser Asp Arg Lea Ser Ala Val Ala Asn Oly Leu Lys Leo Asn 

355 360 365 

Oly Val Asp Cys Aip Glu Gly Olo Thr Ser Leo Val Val Arg Oly Arg 
370 375 380 

Pro Asp Gly Lys Oly Leu Gly Asn Ala Ser Oly Ala Ala Val Ala Thr 
385 390 395 400 

His Leo Asp His Arg lie Ala Met Ser Phe Leo Val Met Gly Leo Val 
4 0 5 4 10 4 15 

Ser Olo Asn Pro Val Thr Val Asp Asp Ala Thr Met lie Ala Thr Ser 
420 425 430 



Ph 



e Pro Glo Phe Met Asp Leo Mot Ala Gly Leo Gly Ala Lys lie Git 
435 440 445 



Leo Ser Asp Thr Lys Ala Ala 
450 455 



( 2 ) INFORMATION FOR SBQ ID NO*: 
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( t ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 1673 base pais 
( B ) TYPE: nnclncacid 
( C ) STRANDEDNESS: drabk 
( D ) TOPOLOO Y: linear 

( i i ) MOLECULE TYPE: DNA (genamkj) 

( i x ) FEATURE: 

( A > NAME/KEY: CDS 
( B ) LOCATION: 8&.1432 

( X i ) SEQUENCE DESCRIPTION: SEQ ID NCt4; 

GTAGCCACAC AT A ATT AC T A TAOCTAOOAA OCCCOCTATC TCTCAATCCC OCGTOATCOC 60 

GCCAAAATGT OACTGTGAAA AATCC ATG TCC CAT TCT GCA TCC CCG AAA CCA 112 

Met Set His Ser Ala Ser Pro Lys Pro 
1 5 



GCA ACC GCC 

Ala Tbr Ala 
1 0 

OGC GAC AAG 

Gly Asp Lys 



TCO GGC GAA 
Set Gly Olo 



A AT ACA GGC 

An Thi Gly 
60 

GGC GAT OTC 

Gly Asp Va 1 

75 

CCC GAA OCT 

Pro Olo Ala 
9 0 

ACC ATG GGC 

Th r Mot Gly 



GAC GCC TCG 
Asp Ala Sor 



CGC GAA ATG 

Ar g Gin Mot 
1 40 

CTO ACQ CTG 

Loo Tbr Leo 
1 5 5 

CCG ATG GCC 

Pro Mo t Ala 
17 0 

AAC ACG CCG 

Asn Thr Pro 



CAC ACC OA A 
His Thr Olo 



ACC GAC AAG 
Thr Asp Lys 
2 20 



CGC CGC TCO 

Ar g Arg Sor 
1 5 

TCC ATC TCG 

Ser lie Ser 
3 0 

ACC CGC ATC 

Thr Arg lie 
45 

CGC GCC ATG 

Arg Ala Mot 



TOG ATC ATC 
Tr p lie lie 



GCG CTC GAT 

Ala Loo Asp 
9 5 

CTT OTC GGC 

Leo Va 1 Gly 
1 1 0 

CTG TCG AAG 

Leo Sor Lys 
1 2 5 

GGC GTT C AO 

Gly Va 1 Gin 



ATC GGC CCG 
lie Gly Pro 



TCC GCG CAG 

Ser Ala Gin 
17 5 

GGC GTC ACC 

Gly Va 1 Thr 
1 9 0 

AAO ATG CTO 

Lys Met Loo 
2 0 5 

GAT OGC G TO 

Asp Gly Va 1 



GAG GCA CTC 
Gin Ala Leo 



CAT CGC TCC 
His Arg Ser 



ACC OGC CTT 

Thr Gly Leo 
5 0 

C AO OCC ATG 

Gin Ala Me t 
65 

AAC OOC OTC 

Asn Gly Va 1 
8 0 

TTC GOC AAT 

P h c Gly Asn 



ACC TAT GAC 
Thr Tyr Asp 



CGC CCG ATG 

Arg Pro Met 
1 3 0 

OTO GAA OCA 

Va 1 Olo Ala 
1 45 

AAG ACG GCC 

Lys Thr Ala 
16 0 

OTA AAA TCC 

Val Lys Scr 



ACC OTC ATC 
Thr Val lie 



CAO GOC TTT 
O 1 n Gly Ph e 
2 1 0 

CGC CAT ATC 
Arg His lie 
225 



ACQ GOC GAA 

Thr Gly Gin 
20 

TTC ATG TTT 

P h e Me t P h c 
3 5 

CTG GAA OGC 

Loo Glo Gly 



GGC OCO AAA 
Gly Ala Lys 



GOC AAT GOC 

Gly Asn Gly 
85 

OCC GOA ACC 

Ala Gly Thr 
1 0 0 

ATG AAG ACC 

Me t Lys Thr 
115 

OGC COC OTO 

Gly Arg Val 



OCC GAT GGC 
Ala Asp Gly 



AAT CCG ATC 

Asn Pro lie 
165 

GCC GTG CTG 

Ala Val Leo 
1 80 

GAO CCG OTC 

Olo Pro Val 
195 

OOC OCC GAC 

Gly Ala Asp 



COC ATC ACC 
Arg lie Thr 



ATC COC ATT 
lie Arg lie 



GGC GOT CTC 

Gly Gly Leo 
40 

GAO GAC GTC 

Glo Asp Val 

5 5 

ATC CGT AAA 

lie Arg Lys 
7 0 

TGC CTO TTG 

Cys Leo Loo 



OGC OCO CGC 
Gly Ala Arg 



TCC TTT ATC 

Sor Phe lie 
1 2 0 

CTO AAC CCG 

Loo Asn Pro 
1 3 5 

GAC COC ATG 

Asp Arg Mo t 
1 50 

ACC TAT CGC 

Thr Tyr Arg 



CTC OCC GOT 
Leo Ala Gly 



ATG ACC CGC 

Me t Thr Arg 
2 0 0 

CTC ACQ GTC 

Leo Thr Val 
2 1 5 

GGC CAO OGC 

Gly Gin Gly 
23 0 



CCG 160 
Pro 
2 5 

GCA 208 
Ala 



ATC 256 
I 1 e 



GAO 304 

O 1 D 



CAO 35 2 

Ol n 



CTC 400 

Leo 

1 0 5 

OGC 44 8 

Gly 



TTG 496 
Leo 



CCG 544 
Pro 



GTG 59 2 

Va 1 



CTC 640 

Leo 

1 85 

GAC 68 8 

Asp 



OAO 736 
Ol u 



AAG 78 4 

Lys 



CTT GTC OOC CAG ACC ATC GAC GTG CCG GGC GAT CCG TCA TCG ACC OCC 83 2 

Leo Val Gly Gin Thr lie Asp Val Pro Gly Asp Pro Ser Ser Thr Ala 
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235 240 245 

TTC CCO CTC OTT OCC OCC CTT CTG OTO OA A GOT TCC OAC OTC ACC ATC 880 

Phc Pro Lou Val Ala Ala Lob Leo Val 0 1 it Gly Set Asp Va 1 Tbr lie 
250 255 2 60 265 



COC AAC GTG 
Ar g Am Val 



CAG OA A ATO 
Gin G 1 u Met 



GOC OA A OAC 

Gly Olu Asp 
3 00 

OTC OTC OTT 

Val Val Val 

3 1 5 

GTC CTG GCO 

Val Leu Ala 
3 3 0 

OGO CTC OAC 

Gly Leo Asp 



OCA CGC OGC 
Ala Ac 8 O 1 y 



TCG CTO ACO 

Ser Leo Tfar 
3 8 0 

ACO OTT GCA 

Tbr Val Ala 

3 95 

ATO OGC CTT 

Me t Gly Leo 
4 1 0 

ATC OCC ACO 

lie Ala Thr 



CTG ATO AAC 

Leo Met A s n 
2 7 0 

OOC OCC OAT 

Gly Ala Asp 
2 8 5 

GTC OCC GAT 

Val Ala Asp 



CCG CCO OA A 
Pro Pro 'Olu 



ATT OCC OCC 

lie Ala Ala 

3 3 5 

OA A CTO COC 

Glo Leo Aig 
3 5 0 

CTT GAA OCC 

Leo Olo Ala 
3 6 5 

GTT COC OOC 

Val Ar g Gly 



ACC CAT CTC 
Thr His Leo 



GCG GCG GAA 
Ala Ala Olo 
4 1 5 

TCC TTC CCC 
Ser Phe Pro 
4 3 0 



CCO ACC COT 
Pro Thr Arg 



ATC GAA GTG 

lie Glo Val 
290 

CTO COC OTC 

Leo Arg Val 
3 05 

COT OCO CCO 

Arg Ala Pro 
3 20 

TCC TTC OCO 

Ser Phe Ala 



GTC A AG OAA 
Val Ly s Olo 



AAC OGC OTC 

Asa O 1 y Val 

3 7 0 

COC CCC OAC 

Arg Pro Asp 
3 8 5 

OAT CAT COT 

Asp His Arg 
40 0 

AAG CCG OTO 

Ly s Pro Val 



OAA TTC ATO 
Olo Phe Me t 



ACC GGC CTC 

Thr Gly Leo 
2 7 5 

CTC AAT GCC 

Leo Aso Ala 



AOG OCT TCO 
Arg Ala Ser 



TCG ATO ATC 

Ser Me t lie 
3 2 5 

OAA OOC OAA 

Olo Ol y Olo 
3 40 

TCO GAT COT 

Ser Asp Arg 
3 5 5 

OAT TOC ACC 

Asp Cys Thr 



GGC AAG GGA 
O 1 y Ly s Gly 



ATC OCO ATO 

lie Ala Me t 
405 

ACQ GTT OAC 

Thr Val Asp 
4 20 

OAC ATG AT G 

Asp Me t Me t 
43 5 



ATC CTC ACC 

lie Leo Tbr 
2 80 

COT CTT OCA 

Arg Leo Ala 
29 5 

AAG CTC AAG 

Lys Leo Lys 
3 1 0 

OAC OAA TAT 

Asp Glo Tyr 



ACC GTO ATO 
Thr Val Met 



CTO OCA OCO 

Leo Ala Ala 
3 60 

OAA OOC OAO 

Olo Gly Olo 

3 75 

CTG GOC OOC 

Leo Oly Gly 
3 90 

AOC TTC CTC 

Ser Phe Leo 



OAC AG T AAC 
Asp Ser Asn 



CCG OGA TTG 
Pro Gly Leo 
440 



TTG 928 
Leo 



OOC 9 76 

Oly 



OOC 10 24 

Oly 



CCO 1072 
Pro 



OAC 1120 

Asp 

3 45 

OTC 116 8 

Va 1 



ATG 1216 
Me t 



GGC 12 64 

oiy 



OTG 13 12 

Val 



ATO 13 60 

Me t 

425 

GGC 14 08 

Gly 



OCA AAG ATC OAO TTG AOC ATA CTC TAOTCACTCG AC AOCG A AAA T ATT ATT TOC 1462 
Ala Lys lie Ola Lea Ser lie Leo 
445 



OAOATTGGGC ATTATTACCO GTTGGTCTCA GCOGOGGTTT AATOTCC AAT CTTCCATACG 1522 

T A AC AOC ATC AOG AA AT ATC AAAAAAGCTT TAOAAGOAAT TOCTAOAOCA OCGACOCCOC 15 82 

CTAAOCTTTC TCAAGACTTC GTTAAAACTO TACTGAAATC CCOOGGGGTC CGOGOATCAA 1642 

ATGACTTCAT TTCTGAOAAA TTOGCCTCOC A 1673 



( 2 ) INFORMATION FOR SEQ ID N05: 

( i ) SEQUENCE CHARACTERISTICS: 

( A ) LENGTH: 449 snriao adds 
( B ) TYPE: amino acid 
(D)TOPCLOGY: linear 

( i i ) MOLECULE TYPE: protein 

( z i ) SEQUENCE DESCRIPTION: SEQ ID NO£: 

Met Ser His Ser Ala Ser Pro Lys Pro Ala Thr Ala Arg Arg Ser Olo 
1 5 10 15 



Ala Leo Thr Gly Glo lie Arg lie Pro Oly Asp Lyi Ser lie Ser His 

2 0 2 5 3 0 



65 



Leu 
5 0 



Pb c Mot 
3 5 

Leu oio 

Oiy Ala 



Val Gly Asn 



Ala Gly 
10 0 



Asp 



Mo t 
I 30 



Al a 



Met Lyi 
1 1 5 

Gly Arg 

Ala Asp 



Ala Asa Pro 



S e r 



I 1 o 



Pbo 

2 10 



Pro 
Va 1 
Arg 



Va 1 

290 



Ala Val 
18 0 

OU Pro 
1 9 5 

Gly Ala 

Arg lie 

Gly Asp 



Gin Gly 
26 0 

Tb r Gly 
2 7 5 



Pro Sot Met 



Al a 

GU 

Val 
3 70 

Asp 
Arg 
Va 1 
Ph e Me t 



Pb 



G 1 



Ol 
8 

Th 



Th 
Va 
Gl 



L e 



Va 



Th 



Le 



Leo A s n Al 



Val Arg Ala Se 



I 1 

3 2 5 



Gin Gly 
3 40 

S e r Asp 

3 5 5 

Asp Cy s 

Gly Lys 

lie Ala 



Th r Val 
42 0 

Asp Met 

435 



Th 



G 1 



Me 
40 



Me 



Oly 
Ol n 



I 1 e 
70 



Gly 



Asp 

5 5 



Leo 
40 



Va 1 



S c r P h e 



Leo 



Asp 
1 5 0 



Tb r 



As a 

t 3 5 

Arg 
Ty r 



Leo Ala 



Met Tbr 



Th r 
2 1 5 



Oly 
23 0 



Gin Gly 



lie L e n 



Arg 



Lys 
3 1 0 

Asp 



L e n 

295 



Leo 
Gl n 



Tbr Val Met 



Len Ala 



Gl i 



Len 
3 9 0 



ier Phe 



Al a 
3 60 



Oly 
3 7 5 

Oly 



Pro Oly 



5,633,435 
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Ala Ser Gly Gin 
lie Aid Th r 



66 



Arg Lys Ol o 



Cys Leo Leo Gin 
Oly 



Ala Arg 



I 1 e 
1 2 0 



Pro 
Me t 



Len 
105 

Oly 



Arg Val 
Oly 



Arg 
200 

Val 



Ser Ser Thr 



Asp Val Thr 



Thr 
2 80 

Al a 
Lys 
T y r 



Va 1 
Gin Me t 



Oly 
Len 



Asp Ser Asn 



Oly Asp 

75 

Pro O 1 o 
90 

Thr Met 
Asp Ala 



Len Arg Gin 



Pro 



Len Thr 
155 



Pro Me t 
1 7 0 



Len 
1 8 5 

Asp 

Ol n 

Ly . 

Al a 

I 1 e 
265 

Len 
Oly 
Ol y 
Pro 



Asp 
343 



Oly 

60 

Va 1 



Ala 

Gly 



Me t 
140 



Len 



His Tbr 
Thr Asp 



Len Val 

235 



Phe Pro 
2 5 0 



Gly O 1 o 

Val Val 
3 1 5 

Val Len 

3 3 0 

Gly Leu 

Ala Arg 

Ser Len 



Oly Thr Val 

3 95 

Val Met Oly 
4 1 0 

Me t lie Ala 
423 



Thr Arg 
45 

Arg Ala 

T r p lie 

Ala Len 

Leo Val 
1 1 0 

Len Ser 
125 

Oiy Val 

lie Oly 



lie Tbr 



Met Gin 



I 1 e 

Asp 

9 5 

Gly 
Ly s 



Ala Ser Ala 



Asn Thr Pro 



Oli 



Ly . 
2 2 0 



Oly Val 
1 90 

Lys Met 
2 0 5 



Len Val Ala 



Arg Asn Val 
Gin Gin Me t 



Asp 
3 0 0 



Len Met 
2 70 

Oly Ala 
285 

Val Ala 



Ala 
Asp 
Oly 



Thr 
3 8 0 



L o n 



Th r 



Leu 
440 



Gly Ala Lys lie 



Asn 

80 

Phe 
Thr 
Ar g 



Gin Val 



Pr . 



Gin 
1 7 5 



Lys 
1 6 0 

Va 1 



Thr Tbr 



Len Gin 



Asp Oly Val Arg 



Oly Ol n Thr lie 



Al a 

255 



Asp 
240 

Leu 



Asn Pro 



Asp 
Asp 



Val Pro Pro Oln 



lie Ala Ala 

3 3 5 

Gin Len Arg 

3 5 0 

Leu Gin Ala 

3 6 5 



I 1 e 



Len 



Ar g 

3 2 0 



Ser 



Va 1 



As n 



Val Arg Oly Arg 



Ala Thr His Len 



Ala Ala Ol u 
4 1 5 



Asp 

40 0 

Lys 



Ser Phe Pro Gin 
43 0 



Oln Len Ser 
445 



I 1 e 



Leu 



67 
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( 2 ) INFORMATION FOR SEQ ID NO& 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 1500 base pais 
( B ) TYPE: mddc arid 
( C ) STRANDEDNESS: doable 
(D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: DNA (genomic) 

( i X ) FEATURE: 

( A ) NAME/KEY: CDS 
( B ) LOCATION: 34-1380 

( x i ) SEQUENCE DESCRIPTION: SEQ ID NO& 

OTOATCOCOC CAAAATGTGA CTGTGAAAAA TCC ATO TCC CAT TCT GCA TCC CCO 54 

Met Sor His Sor Ala Scr Pro 
1 5 

AAA CCA OCA ACC OCC COC CGC TCO GAG GCA CTC ACG GOC G A A AT C COC 102 
Lyi Pro Ala Thr Ala Arg A r g Scr GU Ala Leu Tbr Gly Gin lie Arg 
10 15 2 0 

ATT CCG GOC G AC AAG TCC ATC TCG CAT COC TCC TTC ATO TTT GOC GOT 150 
lie Pro Gly Asp Lys Ser lie Scr His Arg Ser Phe Met Phe Gly Gly 
25 3 0 3 5 

CTC GCA TCG GOC GAA ACC CGC ATC ACC GGC CTT CTG OA A GOC GAG G AC 198 
Leu Ala Ser Gly Gin Thr Arg lie Thr Gly Leo Leu Glu Gly Glu Asp 
40 45 50 55 

GTC ATC AAT ACA GGC CGC GCC ATG CAG GCC ATG GGC GCG AAA ATC COT 246 
Val lie Asa Thr Gly Arg Ala Met Gin Ala Met Gly Ala Lys lie Arg 
60 65 70 

AAA GAG GOC OAT GTC TOO ATC ATC AAC OGC GTC GOC AAT GOC TOC CTG 294 
Lys Glu Gly Asp Val Trp lie lie Asa Oly Val Gly Asn Gly Cys Leu 
75 8 0 85 

T TO CAG CCC GAA OCT GCG CTC GAT TTC GGC AAT GCC GGA ACC OGC GCG 342 
Leu Gin Pro Glu Ala Ala Leu Asp Phe Oly Asn Ala Oly Tbr Gly Ala 
90 9 5 10 0 

COC CTC ACC ATO GOC CTT GTC GOC ACC TAT G AC ATG AAG ACC TCC TTT 390 
Arg Leu Thr Met Gly Leu Val Gly Thr Tyr Asp Met Lys Thr Ser Phe 
10 5 110 115 

ATC GOC GAC OCC TCG CTG TCO AAG CGC CCO ATG GGC COC OTO CTO AAC 43 8 

lie Gly Asp Ala Ser Lou Ser Lys Arg Pro Met Gly Arg Val Leo Asn 
120 125 130 135 

CCO TTG COC GAA ATO GOC CTT CAO OTO GAA GCA OCC GAT GGC GAC CGC 486 
Pro Leu Arg Glu Met Gly Val Gin Val Glu Ala Ala Asp Gly Asp Aig 
140 145 15 0 

ATG CCO CTO ACG CTG ATC GOC CCG AAG AC O GCC AAT CCG ATC ACC TAT 534 
Met Pro Leo Thr Leu lie Gly Pro Lys Thr Ala Asn Pro lie Thr Tyr 
15 5 160 165 

CGC OTO CCO ATG GCC TCC OCO CAG GTA AAA TCC OCC OTO CTO CTC GCC 582 
Arg Val Pro Met Ala Ser Ala Gin Val Lys Ser Ala Val Leu Leu Ala 
17 0 17 5 18 0 

GOT CTC AAC ACG CCG GOC OTC ACC ACC GTC ATC GAO CCO GTC ATO ACC 630 
Oly Leu Asn Thr Pro Gly Val Thr Thr Val lie Gin Pro Val Met Thr 
185 190 195 

CGC GAC CAC ACC GAA AAG ATG CTG CAG OGC TTT OGC GCC GAC CTC ACG 678 
Arg Asp His Thr Glu Lys Met Leu Gin Oly Phe Oly Ala Asp Leu Thr 
200 205 210 215 

GTC GAG ACC GAC AAG OAT GGC OTG CGC CAT ATC CGC ATC ACC OGC CAG 726 
Val Glu Thr Asp Lys Asp Oly Val Arg His lie Arg lie Thr Oly Gin 
220 225 230 

GOC AAG CTT GTC OGC CAO ACC ATC GAC GTO CCO OGC OAT CCG TCA TCG 774 
Oly Lys Leu Val Gly Gin Thr lie Asp Val Pro Gly Asp Pro Ser Ser 
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235 2 40 245 

ACC GCC TTC CCG CTC OTT GCC OCC CTT CTO OTO OA A GOT TCC OAC GTC 822 

Thr Ala Phe Pro Lev Va 1 Ala Ala Leo Leu Val OU Oly Ser Asp Val 

250 255 260 

ACC AT C COC AAC G TO CTO ATO AAC CCG ACC COT ACC OOC CTC ATC CTC 870 

Thr lie Arg Asi Val Lev Met Asn Pro Thr Arg Th r Oly Leo lie Leo 

265 270 275 

ACC TTO CAO OA A ATO OGC OCC OAT ATC OAA OTO CTC A AT OCC COT CTT 9 18 

Thr Leo Ola Glo Met Oly Ala Asp lie Glo Val Leo Asa Ala Arg Leo 

280 285 2 90 295 

GCA GOC GOC G A A OAC OTC GCC GAT CTO CGC GTC AGO OCT TCO A AG CTC 966 

Ala Gly Gly Glo Asp Val Ala Asp Leo Arg Val Arg Ala Scr Lys Leo 

300 305 3 10 

AAO OGC OTC OTC OTT CCO CCG OAA COT OCO CCG TCG ATO ATC OAC OAA 10 14 

Lys Oly Val Val Val Pro Pro OU Arg Ala Pro Ser Met lie Asp Glo 

315 320 325 

TAT CCG GTC CTO OCO ATT OCC OCC TCC TTC OCO OAA OGC OAA ACC OTO 1062 

Tyr Pro Val Leo Ala lie Ala Ala Ser Phe Ala Olo Oly Olo Thr Val 

330 335 3 40 

ATO OAC OGO CTC OAC OAA CTO COC OTC AAO OAA TCO OAT COT CTO OCA 1110 

Met Asp Gly Leo Asp Olo Leo Arg Val Lys Olo Ser Asp Arg Leo Ala 

345 350 355 

OCO OTC OCA COC OOC CTT OAA OCC AAC OOC OTC OAT TOC ACC OAA OOC 1158 

Ala Val Ala Arg Gly Leo Olo Ala Asn Gly Val Asp Cyi Thr Glo Oly 

360 365 370 375 

GAG ATO TCO CTO ACO OTT CGC GOC COC CCC OAC OOC AAO OGA CTO OOC 1206 

Glo Met Ser Lon Thr Val Arg Gly Arg Pro Asp Oly Lys Oly Leo Oly 

380 385 390 

OOC OGC ACO OTT OCA ACC CAT CTC OAT CAT COT ATC OCG ATO AOC TTC 1254 

Oly Gly Thr Val Ala Thr Hi i Leo Asp Bis Arg lie Ala Met Ser Phe 

395 400 405 

CTC OTG ATO OGC CTT OCO OCG OAA A AG CCG OTO ACO OTT OAC OAC A GT 1302 

Leo Val Met Oly Leo Ala Ala Olo Lys Pro Val Thr Val Asp Asp Ser 

4 10 4 15 4 2 0 

AAC ATO ATC OCC ACO TCC TTC CCC OAA TTC ATG OAC ATO ATO CCO OOA 13 50 

Asn Met lie Ala Thr Ser Phe Pro Olo Phe Met Asp Met Met Pro Oly 

425 430 435 

TTO OOC GCA AAO ATC OAO TTO AOC ATA CTC TAOTCACTCO ACAOCGAAAA 1400 

Leo Oly Ala Lys lie Olo Leo Ser lie Leo 

440 445 

TATTATTTGC GAGAT TGGGC ATTATTACCO GTTGGTCTCA OCOGGGGT T T AATOTCCAAT 1460 

CTTCCATACO TAACAGCATC AGGAAAT ATC AA A A AAGCTT 1500 



( 2 ) INFORMATION FOR SEQ ID NCfc7: 

< i ) SEQUENCE CHARACTERISTICS: 

(A ) LENGTH: 449 amino acids 
( B ) TYPE: wrmnn acid 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: protein 

( x i ) SEQUENCE DESCRIPTION: SEQ ID NOV7: 

Met Ser His Ser Ala Ser Pro Lys Pro Ala Thr Ala Arg Arg Ser Olo 
15 10 15 

Ala Leo Thr Oly Olo lie Arg lie Pro Oly Asp Lys Ser lie Ser His 
2 0 2 5 3 0 

Arg Ser Phe Met Phe Gly Oly Leo Ala Ser Oly Olo Thr Arg lie Thr 
35 40 45 



Gly Leo Leo Olo Oly Olu Asp Val lie Asn Thr Oly Arg Ala Met Oln 



71 
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Ala Mot Oly Ala Lys lie Arg Lys Gin Gly Asp Val Trp Ilo lie A a n 
6 5 70 75 80 

Oly Val Gly Aid Gly Cys Leo Leo Gin Pro Glo Ala Ala Leo Asp Phe 
S5 90 95 

Gly Asn Ala Gly Thr Oly Ala Arg Leo Thr Met Gly Leo Val Gly Thr 
1 0 0 1 0 5 - 1 1 0 

Tyr Asp Met Lys Thr Ser Phe lie Gly Asp Ala Sei Leu Set Lys Arg 
115 120 125 

Pro Mot Gly Arg Val Leo Am Pro Leo Arg Glo Met Gly Val Gla Val 
13 0 13 5 140 

Glo Ala Ala Asp Gly Asp Arg Met Pro Leo Thr Leo lie Gly Pro Lys 
145 150 155 160 

Thr Ala Asa Pro lie Thr Tyr Arg Val Pro Met Ala Ser Ala Glo Val 
165 170 175 

Lys Ser Ala Val Leo Leu Ala Gly Leo Asn Thr Pro Gly Val Thr Thr 
18 0 18 5 19 0 

Val lie Glo Pro Val Met Thr Arg Asp His Thr Glo Lys Met Leo Glo 
195 200 205 

Gly Phe Gly Ala Asp Leo Thr Val Glo Thr Asp Lyi Asp Gly Val Arg 
2 10 2 15 2 2 0 

His lie Arg lie Thr Gly Gin Gly Lys Leo Val Gly Gin Thr lie Asp 
225 230 235 240 

Val Pro Oly Asp Pro Ser Ser Thr Ala Phe Pro Leo Val Ala Ala Leo 
245 250 255 

Leo Val Olo Gly Ser Asp Val Thr lie Arg Asn Val Leo Met Asn Pro 

260 265 270 

Thr Arg Thr Oly Leo lie Leo Thr Leu Oln Glu Met Gly Ala Asp lie 
275 280 285 

Glo Val Leo Asn Ala Arg Leo Ala Gly Gly Glo Asp Val Ala Asp Leo 

290 295 300 

Arg Val Arg Ala Ser Lys Leo Lys Gly Val Val Val Pro Pro Glu Arg 
305 310 315 320 

Ala Pro Ser Met lie Asp Gla Tyr Pro Val Leo Ala lie Ala Ala Sor 

325 330 335 

Phe Ala Olo Gly Glo Thr Val Met Asp Gly Leo Asp Glo Leu Arg Val 
3 40 345 350 

Lys Glo Ser Asp Arg Leo Ala Ala Val Ala Arg Gly Leu Gin Ala Asn 
355 360 365 

Gly Val Asp Cys Thr Glo Gly Glo Met Ser Leo Thr Val Arg Gly Arg 
370 375 380 

Pro Asp Gly Lys Gly Leo Oly Oly Oly Thr Val Ala Thr His Leo Asp 
385 390 395 400 

His Arg lie Ala Met Ser Phe Leo Val Met Gly Leo Ala Ala Olo Lys 
405 410 4 15 

Pro Val Thr Val Asp Asp Ser Asn Met lie Ala Thr Ser Phe Pro Oln 
420 425 430 

Phe Met Asp Met Met Pro Gly Leo Gly Ala Lys lie Glo Leo Ser lie 
435 440 445 

Leo 

( 2 ) INFORMATION FOR SBQIDNO*: 

( i ) SEQUENCE CHARACTERISTICS: 
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( A ) LENGTH: 423 amino add* 
( B ) TYPE: amino acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( i x ) MOLECULE TYPE: ptotem 

( X i ) SEQUENCE DESCRIPTION: SH} ID NO:8: 

Set Leo Thr Lou Gin Pro lie Ala 



Pro Gly Ser Lys Thr Val 
2 0 

Ala Bis Oly Lys To r Val 

3 5 

Arg His Met Leo Am Ala 
5 0 



Ser A j n 



Arg Val Asp 
1 0 

Arg Ala Leo 
2 5 



Oly Tb r lie 



A s n 

1 5 



Leo Leo 



A 1 a 

3 0 



Leo 



Leo 
5 5 



Thr 
40 



Asa Leo Leo Asp 



Ser 
45 



Thr Ala Leo Oly 



Ser Ala Asp Arg Thr Arg Cyi Olo 



65 



70 



His Ala Olo Oly Ala Leo 
8 5 

Met Arg Pro Leo Ala Ala 
1 00 

Leo Thr Oly Olo Pro Arg 
1 1 5 



I 1 e 

Olo Leo Phe 



lie Oly 
75 

Leo Oly 
90 



Ala Lea 



Cy s 
105 



Me t 



Asp Ala Leo Arg Leo Oly 
1 3 0 



Oly 
1 3 5 



Ly » 
1 2 0 

Al a 



Leo Oly 
Olo Arg Pro 



Val Ser 
60 

Asa Oly 



Am Ala 



Ser Asa 



Pro 



Thr 
95 



Asp 
1 1 0 



Ly s 

Asa Tyr Pro Pro Leo Arg Leo Ola Oly 



145 



1 5 0 



Asp Val Asp Oly Ser Val 
1 6 5 



Ser Ser Olo 



lie Thr 



Oly Phe 
1 5 5 



Phe Leo 
1 70 



I 1 e 



Tyr 
14 0 



0 1 y 

1 25 



Thr Oly Oly Am 



Thr Ala Leo 



Leo 
1 7 5 



Thr Ala Pro Leo Ala Pro Olo Asp 
1 80 

Leo Val Ser Lys Pro Tyr lie Asp 

19 5 2 0 0 

Phe Oly Val Olo lie Olo Asa 

2 10 2 15 



Th r 
18 5 



Val lie Arg lie 



Lys 
19 0 



lie Thr Leo Asa 



L en 
2 05 



Gin His Tyr Gin 



01 n 

2 20 



Gly Gly Ola Ser Tyr Gin Ser Pro Gly Thr 



Tyr 
23 5 



Ala Ser Ser Ala Ser Tyr 
2 4 5 

Thr Val Lys Val Thr Oly 
2 60 



Phe Leo Ala 



Al a 
2 5 0 



Ala Ala lie Lys 



lie Oly 



Arg Phe Ala Asp Val Lea Olo 

2 7 5 



Lys 
2 8 0 



Ar g 

265 

Me t 



Asa Ser Met Gin 



Asp Asp Tyr lie Ser Cy i 
29 0 



Thr 

2 9 5 



Arg Gly 



Gly Ala Thr 
O 1 o 



1 1 e 

2 8 5 



Oly 

27 0 

Cy s 



Asp Met Asa His lie Pro Asp Ala Ala Mot 



3 05 



3 1 0 



Leo Phe Ala Lys Gly Thr 

3 2 5 

Val Lys Glo Thr Asp Arg 
3 40 

Val Oly Ala Glo Val Glo 

3 5 5 

Pro Glo Lys Leo Asn Phe 
370 



Thr Arg Leo 



Ar e 

3 3 0 



Leo 



Thr 

3 1 5 



Asn 



As n 

3 0 0 



Ala lie 



G 1 y 
2 5 5 

Asp 
Tr p 
Asp 



Leo Phe 



Al a 
3 4 5 



O 1 o 



Ala 

3 75 



oi y 

360 



Hi s 
Olo lie 



Met Ala 



Asp Tyr 



Ala Thr 



Leo 



Ala Lea 



Asp Asp Val 
Tyr Thr Leo 
Oly 
Oly 



Leo 
8 0 



Al a 



lie Val 



His Leo Val 



Lea Olo Oln Gla 



Va I 
160 



Me t 



Asp 



Oly 
Me t Lys Thr 
Pb e Val Val Lys 
Leo Val Olo Oly 



lie Ala Thr Ala 



lie Tyr Asn Trp 

3 3 5 

Thr Olo Leo Arg 
3 5 0 



Asp 

2 40 

Oly 



1 1 e 
Gly 
Me t 

Ala 

3 20 

Arg 
Lys 



lie Arg lie Thr Pro 
3 65 

Tyr Asa Asp His Arg 
380 



75 
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Met Ala Met Cy i Phe Ser Leu Val Ala Leu Sor Asp Thr Pro Val Thr 
385 390 395 400 

lie Lett Aip Pro Lys Cy» Thr Ala Lys Thr Phe Pro Asp Tyr Phe Olu 
405 410 415 

Gin Leo Ala Arg lie Ser Ola 
420 

< 2 ) INFORMATION FOR SBQIDNOft 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 1377 baao pain 
( B ) TYPE: mxJck arid 
( C ) STRANDEDNESS: double 
( D ) TOPOLOGY: linear 

(it ) MOLECULE TYPE: DNA (genomic) 

(x i )SEOrjENCB DESCRIPTION: SEQn>NOS: 

CCATOOCTCA CGOTOCAAOC AOCCOTCCAO CAACTOCTCO T AAGTCCTCT OOTCTTTCTO 60 

GAACCGTCCG TATTCCAGGT GACAAGTCTA TCTCCCACAO OTCCTTCATO T T TOO A GOT C 120 

TCOCTAOCOG TGAAACTCGT ATCACCGGTC TTTTGOAAOG TGAAGATGTT ATC AACACTG 180 

OTAAOOCTAT OCAAOCTATO OOTOCCAOAA T C COT A AGO A AGOTGATACT TGGATCATTO 2 40 

ATGGTGTTOG TAACGGTGGA CTCCTTOCTC CTOAOGCTCC TCTCGATTTC GGTAACGCTG 3 00 

CAACTOGTTG CCOTTTGACT ATGGGTCTTO TTGOTOTTTA CGATTTCOAT AGCACTTTCA 360 

TTOOTGACGC TTCTCTCACT AAGCGTCCAA TOGOTCGTOT GTTGAACCCA CTTCGCOAAA 420 

TGGGTGTGCA GGTOAAGTCT GAAGACGGTG ATCGTCTTCC AGTTACCTTG CGTGGACCAA 480 

AGACTCCAAC GCCAATCACC TACAGGGTAC CTATGGCTTC CGCTCAAGTG AAGTCCGCTG 540 

TTCTGCTTGC TGOTCTCAAC ACCCCAOOTA TCACCACTGT TATCOAGCCA ATCATOACTC 600 

GTGACCACAC TGAAAAGATO CTTCAAGOTT TTGGTGCTAA CCTTACCOTT GAGACTGATG 660 

CTGACGGTGT OCOTACCATC CGTCTTOAAG GTCGTGGTAA GCTCACCGGT CAAGTOATTO 720 

ATGTTCCAGO TGATCCATCC TCTACTGCTT TCCCATTOOT TOCTOCCTTO CTTGTTCCAG 780 

GTTCCGACGT CACCATCCTT AACOTTTTGA TGAACCCAAC CCOTACTOOT CTCATCTTGA 840 

CTCTGCAOOA AAT GGGTOCC GACATCGAAG TOATCAACCC ACGTCTTGCT GGTGGAGAAG 900 

ACGTGGCTOA CTTGCGTGTT COTTCTTCTA CT T TGAAGOG TGTTACTOTT CCAGAAGACC 960 

GTGCTCCTTC TATOATCGAC GAGTATCCAA TTCTCGCTOT TGCAOCTGCA TTCOCTOAAO 1020 

GTGCTACCOT TATGAACOGT TTGGAAGAAC TCCGTGTTAA GGAAAGCGAC CGTCTTTCTO 10 80 

CTGTCGCAAA CGGTCTCAAG CTCAACOGTO TTGATTGCGA TGAAGGTGAO ACTTCTCTCG 1140 

TCGTGCGTOG TCGTCCTGAC GGTAAGGGT C TCGGTAACOC TTCTGGAGCA GCTGTCOCTA 1200 

CCCACCTCGA TCACCOTATC OCTATOAGCT TCCTCGTTAT GGGTC T CGTT TCTGAAAACC 1260 

CTGTTACTOT TGATOATGCT ACTATGATCG CTACTAGCTT CCCAOAGTTC ATGOATTTGA 1320 

TGGCTGGTCT TGGAGCTAAG ATCGAACTCT CCGACACTAA GGCTGCTTOA TGAOCTC 1377 

( 2 ) INFORMATION FOR SBQ ID NO.10: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 318 base pain 
( B ) TYPE: nucleic add 
( C ) STRANDEDNESS: donate 
( D ) TOPOLOGY: lines 

( i i ) MOLECULE TYPE: DNA (genomic) 
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( i x ) FEATURE: 

( A) NAME/KEY: CDS 
( B ) LOCAIION: 87-317 

( x i ) SEQUENCE DESCRIPTION: SEQ ED NChlO: 

AOATCTATCO ATAAOCTTOA TOT AAT TOO A OOAAOATCAA AATTTTCAAT CCCCATTCTT 60 

COATTOCTTC AATTQAAOTT TCTCCG ATG OCG CAA GTT AOC AO A ATC TOC AAT 113 

Met Ala Gin Val Sox Arg lie Cys Asn 
1 5 

GOT OTO CAO AAC CCA TCT CTT ATC TCC AAT CTC TCO AAA TCC AO T CAA 161 

Oly Val Oln Asn Pro Scr Lou He Ser Am Leo Sot Lyi Ser Ser Oln 
1 0 1 5 20 25 

COC AAA TCT CCC TTA TCG OTT TCT CTO AAO ACO CAO CAO CAT CCA COA 2 09 

Arg Lys Ser Pro Len Ser Val Ser Leu Ly» Thr Oln Gin Bit Pro Ax g 
3 0 3 5 4 0 

OCT TAT CCO ATT TCG TCO TCO TGO OOA TTO AAO A AG AOT OOO ATO ACO 257 
Ala Tyx Pxo lie Sex Ser Ser Trp Oly Len Lys Lyi Ser Oly Met Thr 
45 50 55 

TTA ATT OOC TCT GAG CTT COT CCT CTT AAO OTC ATO TCT TCT OTT TCC 305 
Leu lie Oly Ser Oln Len Arg Pro Len Lyi Vtl Met Ser Ser Val Sex 
6 0 6 5 70 

ACO GCG TGC ATO C 318 
Thr Ala Cy i Met 
75 

( 2 ) INFORMATION FOR. SEQ ID NChll: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 77 amino adds 
(B ) TYPE: ammo acid 
( D ) TOPOLOGY: finear 

( i i ) MOLECULE TYPE: protein 

( x i ) SEQUENCE DESCRIPTION: SEQ ID NChll: 

Met Ala Gin Val Ser Arg lie Cys Asn Oly Val Gin Asn Pro Ser Len 
1 5 10 15 

lie Ser Asn Len Sex Lyi Sex Sex Oln Arg Lys Sex Pxo Leo Sex Val 
20 25 3 0 

Sex Leu Lys Thr Gin Oln His Pro Arg Ala Tyr Pro lie Ser Ser Ser 
35 40 45 

Trp Oly Len Lys Lys Ser Oly Met Thr Leo lie Oly Ser Gin Len Arg 
5 0 5 5 6 0 

Pxo Len Lys Val Met Sex Sex Val Sex Thr Ala Cys Met 

6 5 .70 75 " 

( 2 ) INFORMATION FUR SBQ ID NChl2r 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 492 base paks 
( D ) TYPE: nucleic acid 
< C ) STRANDEDNESS: dooble 
( D ) TOPOLOGY: finear 

( i i ) MOLECULE TYPE: DMA (genome) 

( i x ) FEATURE: 

( A ) NAME/KEY: CDS 
( B ) LOCATION: 87..401 

( x i ) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

AOATCTATCO ATAAOCTTOA TOT A A T TOO A GO A AG AT CAA AATTTTCAAT CCCCATTCTT 60 



COATTGCTTC AATTG A AOTT TCTCCG ATG OCG CAA OTT AOC AO A ATC TOC AAT 



1 1 3 
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Met Ala Gin Va 1 Ser Arg lie Cys Am 

I 5 

GOT OTO CAO AAC CCA TCT CTT ATC TCC AAT CTC T CO AAA TCC AGT CAA 16 1 

Oly Val Ola Asa Pro Ser Leu lie Ser An Leo Ser Lys Ser Ser Ola 
10 15 20 25 

COC AAA TCT CCC TTA TCO GTT TCT CTO AAO ACQ CAO CAO CAT CCA COA 209 
Arg Lys Ser Pro Leu Ser Val Ser Leu Lyi Thr Ola Oln His Pro Arg 
3 0 3 5 40 

OCT TAT CCG ATT TCG TCO TCO TOO OOA TTO AAO AAO AOT OOO ATO ACO 25 7 

Ala Tyr Pro lie Ser Ser Sor Tip Oly Leo Lys Lys Ser Oly Met Thr 
45 5 0 5 5 

TTA ATT OOC TCT OAO CTT COT CCT CTT AAO OTC ATO TCT TCT OTT TCC 305 
Leo lie Oly Ser Olu Leo Arg Pro Leo Lys Val Met Ser Ser Val Ser 
6 0 6 3 7 0 

ACO OCO OAO AAA OCG TCO OAO ATT OTA CTT CAA CCC ATT AOA OAA ATC 353 
Tbr Ala Olu Lys Ala Ser Glo lie Val Leo Ola Pro lie Arg Glo lie 
75 80 8 5 

TCC GOT CTT ATT AAO TTO CCT OOC TCC AAG TCT CTA TCA AAT AOA ATT 40 1 

Ser Oly Leo lie Lys Leo Pro Oly Ser Lys Ser Leo Ser Asn Arg lie 
90 95 100 105 

C 40 2 

( 2 ) INFORMATION FOR SBQ ID NO:13: 

( i ) SEQUENCE CHARACTERISTICS: 

( A ) LENGTH: 105 amino adds 
( B ) TYPE: amino acid 
( D ) TOPOLOGY: linear 

( 1 i ) MOLECULE TYPE: protein 

( x i ) SEQUENCE DESCRIPTION: SBQ ID NO: 13: 

Met Ala Oln Val Ser Arg lie Cys Asn Gly Val Gin Asn Pro Ser Leo 
1 5 10 15 

lie Ser Asa Leu Ser Lys Ser Ser Oln Arg Lys Ser Pro Leo Ser Val 
2 0 2 5 3 0 

Ser Leo Lyi Tbr Gin Ola His Pro Arg Ala Tyr Pro lie Ser Ser Ser 
35 40 45 

Trp Oly Leo Lys Lys Ser Oly Met Thr Leo lie Oly Ser Olu Leo Arg 
5 0 5 5 6 0 

Pro Leo Lys Val Met Ser Ser Val Ser Thr Ala Glo Lys Ala Ser Glo 
65 70 75 80 

lie Val Leo Glo Pro lie Arg Oln lie Ser Oly Leo lie Lys Leu Pro 
8 5 9 0 9 5 

Oly Ser Lys Ser Leo Ser Asn Arg lie 
100 10 5 



( 2 ) INFORMATION FOR SEQ ID NO:14: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 233 base pairs 
( B )TYPB: nndoc add 
( C ) STRANDEDNESS: double 
( D ) "TOPOLOGY: linear 

( i i ) MOLECULE TYPE: DNA (genomic) 



( i X ) FEATURE: 

( A ) NAME/KEY: CDS 
( B ) LOCATION: 14*232 



( x i ) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 
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AOATCTTTCA AO A ATO OCA CAA ATT A AC AAC ATO OCT CAA GOO ATA CAA 49 

Met Ala Gin II© Asa Am Met Ala Ola Oly lie Ola 
1 5 10 

ACC CTT A AT CCC AAT TCC A AT TTC CAT AAA CCC CAA OTT CCT AAA TCT 97 
Thr Leu Asn Pro Asa Ser Asa P h e His Ljs Pro Gla Val Pro Lyi Ser 
15 20 2 5 

TCA AOT TTT CTT OTT TTT OGA TCT AAA AAA CTO AAA AAT TCA OCA AAT 145 
Ser Ser Phe Leu Val Phe Oly Ser Lyi Lys Lea Lyi Asa Ser Ala Asa 
30 3 5 40 

TCT ATO TTO OTT TTO AAA AAA OAT TCA ATT TTT ATO CAA A AG TTT TOT 193 
Ser Met Leu Val Lea Lys Lys Asp Ser lie Phe Met Ola Lys Phe Cys 
45 50 55 60 



TCC TTT AGO ATT TCA GCA TCA OTO OCT ACA OCC TGC ATO C 
Ser Phe Arg lie Ser Ala Ser Val Ala Thr Ala Cys Met 
6 5 70 

( 2 ) INFORMATION FOR SBQ ID NOtlS: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 73 amino adds 
( B ) TYPE: ammo acid 
( D ) TOPOLOGY: finear 

( i i ) MOLECULE TYPE: protein 

(xi ) SEQUENCE DESCRIPTION: SBQ ID NO.15: 

Met Ala Ola lie Asn Asn Met Ala Ola Gly lie Ola Thr Leu Asa Pro 
1 5 10 15 

Asn Ser Asn Phe His Lys Pro Oln Val Pro Lys Ser Ser Ser Phe Leu 

20 25 3 0 

Val Phe Gly Ser Lys Lys Leu Lys Asn Ser Ala Asn Ser Met Leu Val 
35 40 45 

Leu Lys Lys Asp Ser lie Phe Met Gin Lys Phe Cys Ser Phe Arg lie 
50 55 60 

Ser Ala Ser Val Ala Thr Ala Cys Met 
65 7 0 



233 



( 2 ) INFORMATION FOR SBQ lDNO:16: 
( i )S 



( A ) LENGTH: 352 base par* 
( B ) TYPE: mdoc acid 
( C ) STRANDEDNESS: doable 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: DNA (genomic) 

( i x ) FEATURE: 

( A ) NAME/KEY: CDS 
( B ) LOCATION: 49-351 

( x i ) SEQUENCE DESCRIPTION: SBQ ID NO: 16: 

AOATCTGCT A OAAATAATTT TOTTTAACTT TAAGAAGOAG ATATATCC ATO OCA CAA 57 

Me t Ala Gin 
1 

ATT AAC AAC ATG GCT CAA GOO ATA CAA ACC CTT AAT CCC AAT TCC AAT 105 
lie Asa Asa Met Ala Gin Gly lie Oln Thr Leu Asn Pro Asn Ser Asn 
5 10 15 

TTC CAT AAA CCC CAA GTT CCT AAA TCT TCA AG T TTT CTT GTT TTT OOA 153 
Phe His Lys Pro Gin Val Pro Lys Ser Sor Ser Phe Leu Val Phe Oly 
20 25 30 35 



TCT AAA AAA CTO AAA AAT TCA OCA AAT TCT ATG TTO GTT TTG AAA AAA 
Ser Lys Lys Leu Lys Asn Ser Ala Asn Ser Met Leu Val Leu Lys Lys 



2 0 1 
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40 45 50 

GAT TCA ATT TTT ATO CAA A AO TTT TOT TCC TTT AGG ATT TCA OCA TCA 249 
Asp Set lie Phc Met Gin Lyi Phe Cyi Scr Phe Aig lie Ser Ala Ser 
5 5 60 65 

OTO OCT AC A OCA CAO A AG CCT TCT OAO ATA OTG TTG CAA CCC ATT AAA 297 
Val Ala Thr Ala Gin Lya Pro Ser Olo lie Va 1 Leu Ola Pro Ilo Lyi 
7 0 7 5 8 0 

OAO ATT TCA GGC ACT QTT AAA TTG CCT GOC TCT AAA TCA TTA TCT A AT 345 
Ola lie Ser Gly Thr Val Lyi Lea Pro Oly Ser Ly s Ser Leu Ser Asa 
8 5 9 0 9 5 

AG A ATT C 352 
Ar g lie 



( 2 ) INFORMATION FOR SBQ ID NO 17: 

< i ) SEQUENCE CHARACTERISTICS: 

( A ) LENGTH: 101 annuo adds 
( B ) TYPE: amino add 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: protein 

( x i ) SEQUENCE DESCRIPTION: SBQ ID NO:17: 

Met Ala Gla lie Ain Asn Met Ala Gla Gly lie Ola Thr Leu Asn Pro 
1 5 10 15 

Asa Ser Asa Phe His Ly t Pro Ola Val Pro Lya Ser Ser Ser Phe Leo 
2 0 2 5 3 0 

Val Phe Oly Ser Lys Lys Lcn Lys Asa Ser Ala Asa Ser Met Leu Val 
35 40 45 

Leo Lys Lya Asp Ser lie Phe Met Gla Lys Phe Cya Ser Phe Ar g lie 
50 55 60 

Ser Ala Ser Val Ala Thr Ala Ola Lys Pro Ser Ola lie Val Leu Gla 
65 70 75 80 

Pro lie Lys Olu lie Ser Oly Thr Val Lys Leu Pro Oly Ser Lys Ser 
8 5 9 0 9 5 

Leu Ser Asn Arg lie 
1 0 0 



( 2 ) INFORMATION FOR SBQ ID NO: 18: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 28 amino adds 
( B ) TYPE: andno add 
( C ) STRANDEDNESS: single 
(D)TOPOLOOY:finear 

( i i ) MOLECULE TYPE: peptide 

(si ) SEQUENCE DESCRIPTION: SBQ ED NO:18: 

Xaa Hi > Gly Ala Ser Ser Arg Pro Ala Thr Ala Arg Lys Ser Ser Gly 
1 5 10 15 

Leu Xaa Gly Thr Val Arg lie Pro Gly Asp Lys Met 

2 0 2 5. 



( 2 ) INFORMATION FOR SEQ ED NO-.19: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 13 amino adds 
( B ) TYPE: minim add 
( C ) STRANDEDNESS: single 
(D ) TOPOLOGY: finear 



( i i ) MOLECULE TYPE: peptide 
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( X i ) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 

Ala Pro Ser Met lie Asp Olu Tyr Pro lie Leu Ala Val 
1 5 10 



( 2 ) INFORMATION FOR SEQ ID NQ20: 

( i ) .SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 15 amino adds 
( B ) TYPE; amino acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: peptide 

( x i ) SEQUENCE DESCRIPTION: SEQ ID NO20: 

lie Tar Oly Leo Leo Ola Gly Glu Asp Val lie Asa Tor Oly Lys 
1 5 10 15 



( 2 ) INFORMATION FOR SEQ ID N021: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 17 base pain 
( B ) TYPE: anckic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: finear 

( i i ) MOLECULE TYPE: Other nockic acid 

( A ) DESCRIPTION: Synthetic DNA 

( x i ) SEQUENCE DESCRIPTION: SEQ ID N021: 

ATGATHGAYO ARTA Y CC 



( 2 ) INFORMATION FOR SEQ ID N022: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 17 base pans 
( B ) TYPE: nockic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: fincar 

( i i ) MOLECULE TYPE: Other nockic acid 

( A ) DESCRIPTION: Synthetic DNA 

( x i ) SEQUENCE DESCRIPTION: SEQ ID NOt22: 

GARGA Y GTNA THAACAC 



( 2 ) INFORMATION FOR SEQ ID N023: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 17 base pairs 
( B ) TYPE: nockic add 
( C ) STRANDEDNESS: single 
( D )TOPOLOGY: fincar 

( i i ) MOLECULE TYPE: Other nockic add 

( A ) DESCRIPTION: Synthetic DNA 

( x i ) SEQUENCE DESCRIPTION: SEQ ID N023: 

GARGA Y GTNA THAATAC 



( 2 ) INFORMATION FOR SEQ ID N024: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 38 base pairs 
( B ) TYPE: nockic add 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: fincar 



( i i ) MOLECULE TYPE: Other nockic add 
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( A ) DESCRIPTION: Synthetic DNA 
( x i ) SEQUENCE DESCRIPTION: SEQ ID NOS4: 
COTOOATAOA TCTAOGAAGA CAACCATOOC TCACOOTC 3 8 



( 2 ) INFORMATION FOR SEQ ID N025: 

( i ) SEQUENCE CHARACTERISTICS: 
(A ) LENGTH: 44 base pairs 
( B ) TYPE: nuckic add 
( C ) STRANDEDNESS: single 
(D)TOPOLOOY: linear 

( i i ) MOLECULE TYPE: Other raxkac acid 

( A ) DESCRIPTION: Synthetic DNA 

( x i ) SEQUENCE DESCRIPTION: SBQ ID ND25: 

OGATAOATTA AOOAAOACOC GCATOCTTCA COOTGCAAOC AOCC 44 



( 2 ) INFORMATION FOR SBQ ID N026: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 33 base pair. 
( B ) TYPE: mdek acid 
( C ) STRANDEDNESS: single 
( D JTOPOLOOY: linear 

( i i ) MOLECULE TYPE: Other irodeic acid 

( A ) DESCRIPTION: Synthetic DNA 

( x i ) SEQUENCE DESCRIPTION: SEQ ID N026: 

OGCTOCCTOA TOAOCTCCAC AATCOCCATC OATOO 35 



( 2 ) INFORMATION FOR SBQ ID N037: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 32 base pans 
( B ) TYPE: noddc add 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: finear 

< i i ) MOLECULE TYPE: Other rmririr add 

( A ) DESCRIPTION: Synthetic DNA 

( x i ) SEQUENCE DESCRIPTION: SEQ ID ND27: 

COTCGCTCOT COTGCGTGGC COCCCTGACO GC 32 



( 2 ) INFORMATION FOR SEQ ID N02&: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 29 base pairs 
( B ) TYPE: noddc acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: finear 

( i i ) MOLECULE TYPE: Other noddc add 

( A ) DESCRIPTION: Synthetic DNA 

( x i ) SEQUENCE DESCRIPTION: SEQ ID N028: 

CGGGCAAGGC CATOCAGOCT ATOGOCOCC 29 



( 2 ) INFORMATION FOR SBQ ID N029: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 31 base pairs 
( B ) TYPE: inddo add 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: finear 



( i i ) MOLECULE TYPE: Other nucinc acid 
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( A ) DESCRIPTION: Symhctk DMA 
( Z i ) SEQUENCE DESCRIPTION: SEQ ID NCh29: 
COOOCTOCCO CCTOACTATO OOCCTCOTCG G 3 1 



( 2 ) INFORMATION FOR SEQ ID NO-JO: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 15 amino adds 
( B ) TYPE: annuo acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: finear 

( i i ) MOLECULE TYPE: protein 

(it) SEQUENCE DESCRIPTION: SEQ ID NO30: 

Xaa His Ser Ala Ser Pro Ly i Pro Ala Thr Ala Arg Aig Ser Olo 
15 10 15 



( 2 ) INFORMATION FOR SEQ ID N031: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 17 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: angle 
( D ) TOPOLOGY: finear 

( i i ) MOLECULE l ira: Other nnclck acid 

( A ) DESCRIPTION: Synthetic DNA 

( x i ) SEQUENCE DESCRIPTION: SEQ ID N031: 

OCGGTBOCSO GYTTSGG 17 



( 2 ) INFORMATION FOR SEQ ID N032: 

< i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 16 amino adds 
( B ) TYPE: wih^ acid 
( C ) STRANDEDNESS: angle 
( D ) TOPOLOGY: finear 

< i i ) MOLECULE TYPE: peptide 

( z i ) SEQUENCE DESCRIPTION: SEQ ID NG32: 

Pro Oly Asp Lys Ser lie Ser His Arg Ser Phc Met Phe Oly Gly Leo 
1 5 10 15 



( 2 ) INFORMATION FOR SEQ ID NO03: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 13 ammo adds 
( B ) TYPE: amino acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: peptide 

( X i ) SEQUENCE DESCRIPTION: SEQ H> N033: 

Leo Asp Pbe Gly Asn Ala Ala Thr Gly Cys Arg Leo Thr 
1 5 10 



( 2 ) INFORMATION FOR SEQ ID NCh34: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 26 base pairs 
( B ) TYPE: nucleic add 
( C ) STRANDEDNESS: single 
(D)TOPOLOOY: finear 



( i i ) MOLECULE TYPE: Other sneJdc add 



5,633,435 

91 92 

-continued 



( A ) DESCRIPTION: Synthetic DNA 
( z i ) SEQUENCE DESCRIPTION: SBQ ID N034: 
COOCAATOCC OCCACCGOCO COCOCC 26 

( 2 ) INFORMATION FOR SBQ IDNQ35: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 49 base pairs 
( B ) TYPE: ecocide acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLGCUIJE TYPE: Other nndcic acxi 

( A ) DESCRIPTION: Synfljcric DNA 

( x i ) SEQUENCE DESCRIPTION: SBQ ID N035: 

OOACOGCTOC TTOCACCOTO AAGCATGCT T AAOCTTOOCO TAATCATGG 49 

( 2 ) INFORMATION FOR SBQ ID N036: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 35 base pans 
( B ) TYPE: mcktc add 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: Other nackic acid 

( A ) DESCRIPTION: Synthetic DNA 

( x i ) SEQUENCE DESCRIPTION: SEQ ID N036: 

GGAAGACGCC CAOAATTCAC GGTGCAAGCA GCCGO 35 



< 2 ) INFORMATION FOR SBQ ID NOS7: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 5 annuo ackia 
( B ) TYPE: amino acid 
( D ) TOPOLOGY: linear 

( i £ ) MOLECULE TYPE: peptide 

( i x ) FEATURE: 

( A ) NAME/KEY: Modified-aite 
( B ) LOCATION: 2 

( D ) OTHER INFORMATION: /note= «Xaa at position 2 is Gly, 
Scr, Tht, Cya, Tyr, Asn, Gin, Apt. or GhT 

( i x ) FEATURE: 

( A ) NAME/KEY: Moc&fied-site 
( B ) LOCATION: 4 

( D ) OTHER INFORMATION: /notes "Xaa. at position 4 as Set 
orlhr- 

( x i ) SEQUENCE DESCRIPTION: SEQ ID N037: 

Ar g Xa a His Xaa Gin 
1 5 



( 2 ) INFORMATION FOR SBQ ID NOS8: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 4 amino acids 
( B ) TYPE: amino acid 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: peptide 

( i x ) FEATURE: 

( A ) NAME/KEY: Modified- aae 
( B ) LOCATION: 4 

( D ) OTHER INFORMATION: /vctc= "Xaa at position 4 is Sex 
or Thr" 
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( x i ) SEQUENCE DESCRIPTION: SEQIDND38: 

Gly Asp Lya X a a 
1 



( 2 ) INFORMATION FOR SEQ ID N039: 

< i ) SEQDENCE CHARACTERISTICS: 
( A ) LENGTH: 5 amino adds 
(B ) TYPE: ammo acid 
( D ) TOPOLOGY: Uses 

( i i ) MOLECULE TYPE: peptide 

( i x ) FEATURE: 

( A ) NAME/KEY: Modiftcd-aite 
( B ) LOCATION: 4 

( D ) OTBER INFORMATION: hatc= "Xaa at position 4 is Ala, 
Atg, An, Asp. Cya, Gin, Gm, Gry. His, Do, Lea, 
Lya, Met, Phe, Pro, So; Thr, Tip, Tyr, or W 

( x i ) SEQUENCE DESCRIPTION: SEQ ID NOS9: 

Ser Ala Gin Xaa Lya 
I 5 



( 2 > INFORMATION FOR SEQ IDNCMO: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 4 amino acids 
( B ) TYPE: amino acid 
(D)TOPOLOGY:finear 

( i i ) MOLECULE TYPE: peptide 

( i x ) FEATURE: 

( A ) NAME/KEY: Modified^ 
( B ) LOCATION: 2 

( D ) OTHER INFORMATION: /oott= "Xaa at position 2 is Ala 
Arg, Am, Asp, Cys, CBn, Ghx, Gly, His, Iks, Leo 
Lya, Met, Phe, Pm, So, Thr, Tip, Tyi; or VaT 

( X i ) SEQUENCE DESCRIPTION: SEQ ID NO-.40: 

Aid Xaa Thr Aig 
1 



( 2 ) INFORMATION FOR SEQ roNCbU: 

( i ) SEQUENCE CHARACTERISTICS: 

( A ) LENGTH: 1287 base pans 
( B ) TYPE: mdcfc acid 
( C ) STRANDEDNESS: double 
( D ) TOPOLOGY: fines 

( i i ) MOLECULE TYPE: DNA (genomic) 

( i x ) FEATURE: 

( A ) NAME/KEY: CDS 
( B ) LOCATION: 1-1287 

( x i ) SEQUENCE DESCRIPTION: SEQ ID NO*l: 

AT G AAA COA GAT A AG GTG CAG ACC TTA CAT OOA OA A ATA CAT ATT CCC 4 8 

Met Lys Arg Asp Lya Va 1 Gin Thr Leo Hia Gly Glo lie His lie Pro 
1 5 10 15 

GOT OAT AAA TCC ATT TCT CAC COC TCT GTT ATG TTT OGC OCG CTA GCG 96 
Gly Aap Lys Ser lie Ser Hia Arg Ser Val Met Phe Gly Ala Leo Ala 
2 0 2 5 3 0 

OCA GOC ACA ACA ACA GTT AAA AAC TTT CTO CCG GGA OCA OAT TOT CTG 144 
Ala Gly Thr Thr Thr Val Lya Asa Pho Leo Pro Gly Ala Asp Cya Leo 
35 40 45 

AGC ACQ AT C GAT TGC TTT AGA AAA ATG GOT GTT CAC ATT GAG CAA AOC 192 
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Ser Thr He Asp Cy» Pho Arg Ly» Met Gly Va 1 His lie Gin Gin Ser 

50 55 60 

AOC AOC OAT OTC OTO ATT CAC OGA AAA GGA AT C GAT GCC CTG AAA GAG 240 

Ser Ser Asp Va 1 Val lie His Gly Lys Gly lie Asp Ala Leo Lys Gin 

65 70 75 80 

CCA GAA AOC CTT TTA OAT OTC OGA AAT TCA GOT ACA ACO ATT COC CTO 288 

Pro Gin Ser Leo Leo Asp Val Gly Am Ser Gly Thr Thr lie Arg Leo 

8 5 9 0 9 5 

ATO CTC OGA ATA T TO GCG GGC CGT CCT TTT TAC AOC OCO GTA GCC OGA 33 6 

Met Len Oly lie Leo Ala Oly Arg Pro Phe Tyr Ser Ala Val Ala Gly 

100 10 5 110 

GAT GAG AOC ATT GCG AAA CGC CCA ATO A AG CGT GTG ACT GAG CCT TTO 38 4 

Alp Glo Ser lie Ala Lys Arg Pro Met Lys Arg Val Thr Clo Pro Leo 

115 120 125 

AAA AAA AT G OOO OCT AAA ATC G AC OOC AGA GCC GGC OGA GAG TTT ACA 43 2 

Lys Lys Met Gly Ala Lys lie Asp Oly Arg Ala Gly Gly Glo Phe Thr 

13 0 13 5 140 

CCG CTO TCA OTO AOC GGC OCT TCA TTA AAA OOA ATT OAT TAT OTA TCA 480 

Pro Lea Ser Val Ser Oly Ala Ser Leo Lys Gly lie Atp Tyr Val Ser 

145 150 155 160 

CCT OTT OCA AGC GCG CAA ATT AAA TCT OCT OTT TTO CTO OCC OOA TTA 52 8 

Pro Val Ala Ser Ala Ola lie Ly i Ser Ala Val Leo Leo Ala Oly Leo 

165 170 175 

CAO OCT GAG GGC ACA ACA ACT GTA ACA GAO CCC CAT AAA TCT CGG G AC 57 6 

Gin Ala Glo Gly Thr Thr Thr Val Thr Olo Pro His Lys Ser Arg Asp 

18 0 18 5 19 0 

CAC ACT GAO CGG ATG CTT TCT OCT TTT OGC GTT AAO CTT TCT OA A OAT 62 4 

His Thr Gin Arg Met Leo Ser Ala Phe Gly Val Lys Len Ser Glo Asp 

195 200 205 

CAA ACO AG T OTT TCC ATT GCT OGT GOC CAO AAA CTG ACA OCT GCT GAT 672 

Gin Thr Ser Val Ser lie Ala Gly Oly Oln Lys Leo Thr Ala Ala Asp 

2 10 2 15 220 

ATT TTT OTT CCT OGA OAC ATT TCT TCA OCC OCO TTT TTC CTT GCT GCT 720 

lie Phe Val Pro Gly Asp lie Ser Ser Ala Ala Phe Phe Leo Ala Ala 

225 230 235 240 

GOC GCG ATG GTT CCA AAC AOC AOA ATT OTA TTG AAA AAC OTA OOT TTA 768 

Oly Ala Met Val Pro Asa Ser Arg lie Val Leo Lys Asa Val Gly Leo 

245 250 255 

AAT CCG ACT CGG ACA OGT ATT ATT OAT OTC CTT CAA AAC ATO GGG GCA 816 

Asa Pro Thr Arg Thr Oly lie lie Asp Val Leo Gla Asa Met Gly Ala 

260 265 270 

AAA CTT GAA ATC AAA CCA TCT GCT GAT AOC GGT OCA OAO CCT TAT OGA 864 

Lys Loo Glo lie Lys Pro Ser Ala Asp Ser Gly Ala Glo Pro Tyr Oly 

275 280 285 

GAT TTG ATT ATA GAA ACG TCA TCT CTA A AG GCA OTT GAA ATC OOA GGA 912 

Asp Len lie lie Gin Thr Ser Ser Leo Lys Ala Val Olo lie Oly Gly 

290 295 300 

GAT ATC ATT CCO CGT TTA ATT GAT OAO ATC CCT ATC ATC OCO CTT CTT 960 

Asp lie lie Pro Arg Leo Ilo Asp Olo lie Pro lie lie Ala Leo Leo 

305 310 315 320 

GCG ACT CAO OCO GAA GO A ACC ACC OTT ATT AAO OAC GCG GCA GAO CTA 1008 

Ala Thr Gin Ala Glo Oly Thr Thr Val lie Lys Asp Ala Ala Olo Leo 

325 330 335 

AAA OTO AAA GAA ACA AAC COT ATT GAT ACT GTT GTT TCT GAO CTT COC 1056 

Lys Val Lys Olo Thr Asa Arg lie Asp Tbr Val Val Ser Gin Leo Arg 

340 345 350 

AAG CTG GOT GCT GAA ATT GAA CCG ACA OCA OAT OOA ATO AAO OTT TAT 1104 

Lys Leo Gly Ala Glo lie Olo Pro Thr Ala Asp Oly Met Lys Val Tyr 

355 360 365 

OOC AAA CAA ACO TTO AAA GGC GOC GCT OCA GTG TCC AGC CAC GGA OAT 115 2 
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Oly Lys Gin Thr Leo Lys Oly Oly Ala Ala Val Sor Ser His Gly Asp 
370 375 380 

CAT COA ATC OOA ATO ATO CTT GOT ATT OCT TCC TOT ATA ACG OAO GAG 1200 
His Arg lie Gly Met Met Leo Gly lie Ala Ser Cys lie Thr Ola Glo 
385 390 395 400 

CCO ATT OA A ATC GAG CAC ACG OAT GCC ATT CAC GTT TCT TAT CCA ACC 1248 
Pro lie G 1 a lie Olo Hss Thr Asp Ala lie His Val Ser Tyr Pro Thr 
40 5 4 10 4 15 

TTC TTC GAG CAT TTA A AT AAG CTT TCG AAA AAA TCC TOA 1287 
Phe Phe Glo His Lea Asa Lys Leo Ser Lys Lys Ser 
420 425 

( 2 ) INFORMATION FOR SBQ ID NOrf2; 

( i ) SEQUENCE CHARACTERISTICS: 

( A ) LENGTH: 428 ammo adds 
( B ) TYPE: amino acid 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE una protein 

( z i ) SEQUENCE DESCRIPTION: SEQ ID NCH42: 

Met Lys Arg Asp Lys Val Gin Thr Leo His Oly Gin lie His lie Pro 
15 10 15 

Oly Asp Lys Ser lie Ser His Arg Ser Val Met Phe Oly Ala Leo Ala 
20 25 30 

Ala Oly Thr Thr Thr Val Lys Asa Phe Leo Pro Oly Ala Asp Cys Leo 
3 5 40 4 5 

Ser Thr lie Asp Cys Phe Arg Lys Met Oly Val His lie Glo Gin Sor 
50 55 60 

Ser Sor Asp Val Val lie His Gly Lys Gly lie Asp Ala Leo Lys Olo 
65 70 75 80 

Pro Olo Ser Lcn Leo Asp Val Gly Asn Ser Gly Thr Thr lie Arg Leo 
8 5 9 0 9 5 

Met Leo Oly lie Leo Ala Oly Arg Pro Phe Tyr Ser Ala Val Ala Oly 
100 10 5 110 

Asp Olo Ser lie Ala Lys Arg Pro Met Lys Arg Val Thr Olo Pro Leo 
115 120 12 5 

Lys Lys Met Oly Ala Lys lie Asp Gly Arg Ala Oly Oly Olo Phe Thr 
130 13 5 140 

Pro Leo Ser Val Ser Gly Ala Ser Len Lys Gly lie Asp Tyr Val Ser 
1*5 150 155 160 

Pro Val Ala Ser Ala Gla lie Lys Ser Ala Val Leo Leo Ala Gly Leo 
165 170 175 

Gin Ala Olo Gly Thr Thr Thr Val Thr Gin Pro His Lys Ser Arg Asp 
18 0 18 5 190 

His Thr Glo Arg Met Leo Ser Ala Phe Gly Val Lys Leo Ser Glo Asp 
195 200 205 

Gin Thr Ser Val Ser lie Ala Oly Gly Gin Lys Leo Thr Ala Ala Asp 
2 10 2 15 2 2 0 

lie Phe Val Pro Gly Asp lie Ser Ser Ala Ala Phe Phe Leo Ala Ala 
225 230 235 240 

Oly Ala Met Val Pro Asn Ser Arg lie Val Leo Lys Asn Val Oly Leo 
245 250 255 

Asn Pro Thr Arg Thr Gly lie lie Asp Val Leo Ola Asn Met Gly Ala 
260 265 270 

Lys Leo Olo lie Lys Pro Ser Ala Asp Ser Oly Ala Olo Pro Tyr Oly 
275 280 285 
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Asp Leo lie lie Ola Thr Ser Ser Leo Lys Ala Va 1 OU lie Gly Oly 

2 90 295 .300 

Asp lie lie Pro Arg Leo lie Atp Gin lie Pro lie lie Ala Leu Leo 
305 310 315 320 

Ala Tbr Gin Ala Olo Oly Thr Thr Va 1 lie Lys Asp Ala Ala Olo Lea 

3 25 3 3 0 3 3 5 

Lys V a 1 Lys Olo Thr Asa Arg lie Asp Thr Val Val Ser Ola Leo Arg 
340 345 350 

Lyi Lea Oly Ala Ola lie Olo Pro Thr Ala Asp Oly Met Lys Val Tyr 

355 360 365 

Oly Lys Gin Thr Leo Lys Oly Oly Ala Ala Val Ser Ser His Oly Asp 
370 375 380 

Hit Arg lie Oly Met Mot Leo Oly lie Ala Ser Cys lie Thr Olo Olo 
385 390 395 400 

Pro lie Olo lie Olo Bis Thr Asp Ala lie His Val Ser Tyr Pro Thr 
405 410 415 

Phe Phe Olo Hit Leu Am Lys Lea Ser Lys Lys Ser 
420 42 5 

( 2 ) INFORMATION FOR SEQ ID N0543: 

( i ) SEQUENCE CHARACTERISTICS: 

( A ) LENGTH: 1293 base pans 
( B ) TYPE: nncjqc arid 
( C ) STRANDEDNESS: doable 
(D ) TOPOLOGY: ancar 

( i i ) MOLECULE TYPE: UNA (genomic) 

( i x ) FEATURE: 

( A ) NAME/KEY: CDS 
( B ) LOCATION: U1293 

( x i ) SEQUENCE DESCRIPTION: SEQ ID NO?©: 

AT G OTA A AT GAA CAA ATC ATT GAT ATT TCA OGT CCO TTA A AO GGC OAA 48 
Met Val Asn Glo Gin lie lie Asp lie Ser Oly Pro Leo Lys Gly Olo 
1 5 10 15 

ATA OAA OTO CCO GGC OAT A AO TCA AT G ACA CAC COT OCA ATC ATO TTO 96 
lie Ola Val Pro Gly Asp Lys Ser Met Thr His Arg Ala lie Met Leo 
20 2 5 30 

GCO TCO CTA OCT GAA GOT OTA TCT ACT ATA TAT A AG CCA CTA CTT OOC 144 
Ala Ser Lea Ala Ola Oly Val Ser Thr lie Tyr Lya Pro Leo Leo Oly 
35 40 45 

GAA GAT TOT COT COT ACO ATO G AC ATT TTC CGA CAC TTA GGT OTA GAA 192 
Glo Asp Cys Arg Arg Thr Met Asp lie Phe Arg His Lea Gly Val Gla 
50 5 5 6 0 

ATC AAA GAA GAT OAT OAA AAA TTA OTT OTO ACT TCC CCA OOA TAT CAA 240 
lie Lys Ola Asp Asp Ola Lys Leo Val Val Thr Ser Pro Oly Tyr Ola 
65 70 75 80 

OTT AAC ACO CCA CAT CAA OTA TTG TAT ACA OGT A AT TCT OOT ACO ACA 288 
Val Asa Thr Pro His Ola Val Leo Tyr Thr Oly Asn Ser Oly Thr Thr 
8 5 9 0 9 5 

ACA COA TTA TTO OCA GGT TTG TTA AGT OOT TTA OOT A AT OAA AOT OTT 336 
Thr Arg Leo Leo Ala Oly Leo Leo Ser Gly Leo Gly Asa Glo Ser Val 
10 0 10 5 110 

TTG TCT OGC GAT GTT TCA ATT GOT AAA AOO CCA ATO OAT COT GTC TTO 384 
Leu Ser Gly Asp Val Ser lie Oly Lys Arg Pro Met Asp Arg Val Leo 
115 12 0 12 5 

AG A CCA TTG AAA CTT ATO GAT GCO AAT ATT OAA OGT ATT OAA OAT A AT 432 
Arg Pro Leu Lys Lea Met Asp Ala Asa lie Glo Oly lie Olo Asp Asa 
13 0 13 5 14 0 
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TAT ACA CCA 

Tyr Thr Pro 
1 45 

CAA ATO OA A 

Gin Me t Ol o 



AOT T TO TTT 
Ser Leu Phe 



COA A AT CAT 

Arg Am His 
195 

OCA OA A OGO 

Ala Glo Gly 
2 1 0 

AAA CCT OCA 

Lyi Pro Ala 

2 2 5 

TTT ATT OTT 

Ph c lie V a I 



A AT GTT GOA 
A * n Va I Gly 



AAA ATO OOC 

Lys Met O 1 y 
275 

CCT ACT OCT 

Pie Thr Ala 
2 90 

ACA ATC OA A 

Thr lie Gl u 
3 0 5 

ATA OCA TTA 

lie Ala Leo 



OCC GAG OA A 
Ala Gin Gin 



OAT ATO TTA 

A ■ p Me t Leo 

355 

TTG ATT ATT 

Leu lie lie 
3 70 

ACT GAT CAT 

Thr Asp His 
3 8 5 

AGC GAG CCT 

Ser OH Pro 



CCA GOA TTT 
Pro Gly Phc 



TTA ATT ATT 

Leo lie lie 
1 5 0 

OTT OCA AOT 

Va 1 Ala S e r 
1 65 

TCT A AG GAA 

Sor L y s G In 
1 8 0 

ACT OAO ACG 

Thr O I o Thr 



TTA TCA ATT 
Leu Ser lie 



OAT TTT CAT 

Asp Phe His 

23 0 

GCA OCA CTT 

Ala Ala Leo 
245 

ATC A AT CAA 

lie As n Gin 
2 60 

GOT AAT ATC 

Gly An lie 



TCT ATT COT 
Ser lie Arg 



GGA GAA TTA 

Gly Glo Leo 
3 1 0 

CTT TOT ACA 

Lea Cys Thr 
3 2 5 

TTA AAA GTA 

Leo L y s Va 1 
3 40 

AAC TTG TTA 

Asn Leo Leo 



CAT CCG TCA 
His Pro Ser 



COA ATA OOA 

Arg lie Gly 
3 90 

GTC AAA ATC 

Va 1 Ly • lie 
405 

TTA CCA AAA 

Leo Pro Lyi 
4 20 



A AO CCA TCT 
Ly i Pro Ser 



OCA CAA OTA 
Ala Ola Va 1 



CCG ACC ATC 

Pro' Thr lie 
1 8 5 

ATG TTC AAA 

Met Phe Lys 
2 0 0 

AAT ACA ACC 

Asn Thr Thr 
2 1 5 

GTT CCT GOC 

Va 1 Pro Gly 



ATC ACA CCA 
lie Thr Pro 



ACA COT TCA 

Thr Arg Ser 
2 65 

CAA CTT TTC 

Gin Leo Phe 
2 8 0 

ATT CAA TAC 

lie Gin Tyr 
2 9 5 

OTT CCA AAA 

Va 1 Pro Lys 



CAA GCA OTT 
Gin Ala Va 1 



AAA GAA ACA 

Lys Gin Thr 
3 45 

GOG TTT GAA 

G 1 y Phe O 1 o 
3 6 0 

GAA TTT AAA 

Olo Phe Lys 

3 7 5 

ATO ATO CTT 

Met Met Leo 



AAA CAA TTT 
Lys Gin Phe 



CTA A AO CTT 
Leo Lys Leo 
4 25 



GTC ATA AAA 

Va I lie Lyt 
155 

AAA ACT GCC 

Lys Ser Ala 
1 70 

ATT AAA GAA 

lie Lys Ol o 



CAT TTT AAT 
His Phe Asn 



CCT GAA OCA 

Pro Olo Ala 
2 2 0 

OAT ATT TCA 

Asp lie Ser 
2 3 5 

GOA AOT OAT 

Oly Ser Asp 
2 5 0 

GOT ATT ATT 

Oly lie lie 



AAT CAA ACA 
Asn Gin Thr 



ACA CCA ATO 

Thr Pro Me t 
3 0 0 

OCA ATT GAT 

Ala lie Asp 
3 1 5 

OGC ACQ AOT 

Oly Thr Ser 

3 3 0 

AAT AGA ATT 

Asn Arg lie 



TTA CAA CCA 
Leo Gin Pro 



ACA AAT OCA 

Thr Asn Ala 
3 8 0 

GCA OTT GCT 

Ala Va 1 Ala 
3 95 

GAT GCT GTA 

Asp Ala Va 1 
4 1 0 

TTA CAA AAT 

Leo Gin Asn 



GGT ATA AAT 
Gly lie Asn 



ATT TTA TTT 

lie Leo Phe 
1 7 3 

TTA OAT OTA 

Leo Asp Val 
1 90 

ATT CCA ATT 

lie Pro lie 
20 5 

ATT COA TAC 

lie Arg Tyr 



TCT OCA OCG 
Ser Ala Ala 



OTA ACA ATT 

Val Thr lie 

2 5 5 

OAT ATT OTT 

Asp lie Val 
2 70 

ACT OCT OCT 

Thr Oly Ala 
2 85 

CTT CAA CCA 

Leo Oln Pro 



GAA CTO CCT 
Glo Leo Pro 



ACA ATT AAA 

Thr lie Lys 

3 3 5 

GAT ACA ACG 

Asp Thr Thr 
3 5 0 

ACT AAT GAT 

Thr Asn Asp 

3 6 5 

ACA GAT ATT 

Thr Asp lie 



TOT OTA CTT 
Cys Val Leo 



AAT OTA TCA 
Asn Val Ser 
4 1 5 

GAG OOA T A A 
Olo Oly 
43 0 



TAT 480 

Tyr 

1 6 0 

OCA 528 
A 1 a 



AGT 5 76 

Ser 



GAA 62 4 

O 1 o 



ATT 672 
I 1 e 



TTC 720 
Ph e 
2 4 0 

CAT 768 
Hi s 



GAA 8 1 6 

Glo 



GAA 864 
G 1 n 



ATA 9 12 

I 1 e 



OTA 960 
Va 1 
3 2 0 

OAT 100 8 

Asp 



GCT 1056 
A 1 a 



OOA 110 4 

Oly 



TTA 115 2 

Leo 



TCA 12 00 

Ser 

4 0 0 

TTT 12 48 

Ph e 



12 93 



( 2 )©0CRMATIONFORSEQn)NO344: 

( i ) SEQUENCE CHARACTERISTICS: 

( A ) LENGTH: 430 amino acids 
( B ) TYPE: amino acid 
(D)TC*OLOGY:finear 
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( i i ) MOLECULE TYPE: protein 

( x J )SE(»ENCBOTSCRIPTiaN:SBQIDN0544: 

Met Val Asa Glo Gin lie lie Asp lie Ser Oly Pro Leu Lp Gly Glu 
1 5 10 15 

lie Gin Val Pro Gly Asp Lyu Ser Met Tbr His Arg Ala lie Met Leu 
20 25 30 

Ala Ser Leu Ala Glu Gly Val Ser Thr lie Tyr Lys Pro Leo Leu Oly 
3 5 40 4 5 

Glu Asp Cys Arg Arg Thr Met Asp lie Phe Arg His Len Gly Val Glu 
50 55 60 

lie Lys Glu Asp Asp Glu Lys Leu Val Val Thr Ser Pro Gly Tyr Gin 
65 70 75 80 

Val Asa Thr Pro His Gin Val Leu Tyr Thr Gly Asa Ser Gly Thr Thr 
85 90 95 

Thr Arg Leu Leu Ala Gly Leo Leu Ser Gly Leu Gly Asn Glo Ser Val 
10 0 10 5 110 

Leu Ser Gly Asp Val Ser lie Oly Lys Arg Pro Met Asp Arg Val Leu 
115 120 125 

Arg Pro Leu Lys Leu Met Asp Ala Asa lie Olu Oly lie Glu Asp Asa 
13 0 13 5 140 

Tyr Thr Pro Leu lie lie Lys Pro Ser Val lie Lys Gly lie Asn Tyr 
145 150 155 160 

Gin Met Glu Val Ala Ser Ala Ola Val Lys Sor Ala lie Leu Phe Ala 
16 5 17 0 17 5 

Ser Leu Phe Ser Lys Glu Pro Thr lie lie Lys Glu Leu Asp Val Ser 
180 185 190 

Arg Asn His Thr Glu Thr Met Phe Lys His Phe Asa lie Pro lie Glo 
195 200 205 

Ala Glu Gly Leu Ser lie Asn Thr Thr Pro Gin Ala lie Arg Tyr lie 
2 10 2 15 22 0 

Lys Pro Ala Asp Phe His Val Pro Gly Asp lie Ser Ser Ala Ala Phe 
225 230 235 240 

Phe lie Val Ala Ala Leu lie Thr Pro Gly Ser Asp Val Thr lie His 
245 250 255 

Asa Val Gly lie Asa Gin Thr Arg Ser Gly lie lie Asp lie Val Glu 
260 265 270 

Lys Met Gly Gly Asn lie Gin Leu Phe Asn Gin Tbr Thr Gly Ala Glu 
275 280 285 

Pro Thr Ala Ser lie Arg lie Gin Tyr Thr Pro Met Leo Gin Pro lie 
290 295 300 

Thr lie Glu Gly Olu Len Val Pro Lys Ala lie Asp Olu Leu Pro Val 
305 310 315 320 

lie Ala Leu Leu Cys Thr Glu Ala Val Oly Thr Ser Thr lie Lys Asp 

325 330 335 

Ala Glu Glu Leu Lys Val Lys Glu Thr Asn Arg lie Asp Thr Thr Ala 
340 345 350 

Asp Met Len Asn Leo Leo Gly Phe Glu Leu Gin Pro Thr Asn Asp Oly 

355 360 365 

Leo lie lie His Pro Ser Glu Phe Lys Thr Asn Ala Thr Alp lie Leu 
370 375 380 

Thr Asp His Arg lie Gly Met Met Leo Ala Val Ala Cys Val Leu Ser 
385 390 395 400 
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Set Olu Pro Val Lyi lie Lys Gin Phe Asp Ala Val Aid Val Ser Ph< 

405 4 1 0 41 5 

Pro Gly Phc Lou Pro Ly » Lou Lys Loo Lou Gin Am Ola Gly 

420 425 430 



( 2 ) INFORMATION FOR SEQ ID NCh45: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 28 base pairs 
( B ) TYPE: mckic arid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( i i )M0LECULB TYPE: Other radric acid 

( A ) DESCRIPTION: Synthetic DNA 

( z i ) SEQUENCE DESCRIPTION: SEQ ID NO*5: 

GGAAC AT ATG A AACG AG AT A AGGTGCAG 



( 2 )WPORMATI0NF0RSEQIDNO 3 46: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 35 base pairs 
( B ) TYPE: nuclck add 
( C ) STRANDEDNESS: toga 
( D ) TOPOLOGY: finear 

( i i ) MOLECULE TYPE: Otfjcr mjcjnic acid 

( A ) DESCRIPTION: Synthetic DNA 

( x i ) SEQUENCE DESCRIPTION: SEQ ID NO*6: 

GGAAT TCAAA CTTC AGO ATC TTGAG ATAGA AAATG 35 



( 2 ) ENFORMATION FOR SEQ ID NO#7: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 28 base pairs 
( B ) TYPE: mdesc arid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: fines 

( i i ) MOLECULE TYPE: Other endek arid 

( A) DESCRIPTION: Synthetic DNA 

( x i ) SEQUENCE DESCRIPTION: SEQ ID NOj47: 

GOOGCCATGG TAAATGAACA AATCATTO 2 8 



( 2 ) INFORMATION FOR SEQ ID NO*8: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 33 base pairs 
( B ) TYPE: mckic arid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: fines 

( i i ) MOLECULE TYPE: Other snekk arid 

( A ) DESCRIPTION: Synthetic DNA 

( X i ) SEQUENCE DESCRIPTION: SEQ ID NCh48: 

GGGGGAGCTC ATTATCCCTC ATTTTGTAAA AGC 33 



( 2 ) INFORMATION FOR SEQ ID NO*9: 

( i ) SEQUENCE CHARACTERISTICS: 

( A ) LENGTH: 480 amino adds 
( B ) TYPE: amino arid 
(D)TOPOLOGY: linear 



( i i ) MOLECULE TYPE: protein 
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( x i ) SEQUENCE DESCRIPTION: SEQ ID NOs49: 

Leu Thr Aap Ola Thr Leo Val Tyr Pro Pbe Lys Asp lie Pro Ala Asp 
1 5 10 15 

Gin Gin Lys Val Val lie Pro Pro Gly Ser Lys Ser lie Sei Am Arg 
2 0 2 5 3 0 

Ala Leo lie Leu Ala Ala Leo Oly Glo Gly Gin Cya Lys lie Lys Asn 
3 5 4 0 4 5 

Leo Lea His Ser Asp Asp Xbr Lys His Met Leo Thr Ala Val His Glo 
5 0 5 5 6 0 

Lea Lys Gly Ala Tbr lie Ser Tip Glo Asp Asa Gly Glo Tbr Val Val 
65 70 75 80 

Val Glo Oly His Gly Gly Ser Tbr Leo Ser Ala Cys Ala Asp Pro Leo 
8 5 9 0 9 5 

Tyr Leo Gly Asa Ala Oly Tbr Ala Ser Arg Pbe Leo Tbr Ser Leo Ala 
10 0 10 5 110 

Ala Lea Val Asn Ser Tbr Ser Ser Oln Lys Tyr lie Val Lea Thr Gly 
115 12 0 12 5 

Asn Ala Arg Met Gin Oln Arg Pro lie Ala Pro Leo Val Asp Ser Leo 
13 0 13 5 14 0 

Arg Ala Asn Gly Thr Lys lie Ola Tyr Leo Asn Asn Glo Oly Ser Lea 
145 150 155 160 

Pro lie Lys Val Tyr Thr Asp Ser Val Phe Lys Gly Oly Arg lie Olo 
16 5 17 0 17 5 

Leo Ala Ala Thr Val Ser Ser Gin Tyr Val Ser Ser lie Leo Met Cys 
18 0 18 5 19 0 

Ala Pro Tyr Ala Glo Olo Pro Val Thr Leo Ala Leu Val Gly Oly Lys 
195 200 205 

Pro lie Ser Lys Leo Tyr Val Asp Met Thr lie Lys Met Met OIu Lys 
2 10 2 15 2 2 0 

Phe Oly lie Asn Val Glo Thr Ser Thr Thr Glo Pro Tyr Thr Tyr Tyr 
225 230 235 240 

lie Pro Lys Oly His Tyr lie Asn Pro Ser Glo Tyr Val lie Glo Ser 
245 25 0 255 

Asp Ala Ser Ser Ala Thr Tyr Pro Leo Ala Phe Ala Ala Met Tbr Oly 
260 265 270 

Thr Thr Val Thr Val Pro Asn lie Gly Pbe Glo Ser Leo Gin Gly Asp 
275 280 285 

Ala Arg Pbe Ala Arg Asp Val Leo Lys Pro Met Gly Cys Lys lie Thr 
290 295 300 

Oln Thr Ala Tbr Ser Thr Thr Val Ser Oly Pro Pro Val Gly Tbr Leo 
305 310 315 320 

Lys Pro Loo Lys His Val Asp Met Glo Pro Met Thr Asp Ala Phe Leo 

325 330 335 

Tbr Ala Cys Val Val Ala Ala lie Ser His Asp Ser Asp Pro Asn Ser 
340 345 350 

Ala Asn Thr Tbr Thr lie Glo Gly lie Ala Asn Oln Arg Val Lys Glo 

355 360 365 

Cys Asn Arg lie Leo Ala Met Ala Thr Glo Leo Ala Lys Pbe Gly Val 
370 375 380 

Lys Tbr Thr Glo Leo Pro Aap Gly lie Gin Val His Gly Leo Asn Ser 
385 390 395 400 

lie Lys Asp Leo Lys Val Pro Ser Asp Ser Ser Gly Pro Val Gly Val 
40 5 4 10 4 15 
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Cys Thr Tyr Asp Asp His Arg Va 1 Alt Met Ser Phe Scr Leo Lea Ala 
420 425 430 

Gly Mot Val Am Ser Gin Asa Glo Arg Asp Glu Val Ala Asn Pro Val 
435 440 445 

Arg lie Leu Gin Arg His Cys Thr Oly Lys Thr Trp Pro Gly Trp Trp 
450 455 460 

Asp Val Loo Bis Ser GIo Leo Gly Ala Lys Leo Asp Gly Ala Olo Pro 
465 47 0 475 480 

( 2 ) INFORMATION FOB. SBQ ID NO50: 

( i ) SEQUENCE CHARACTERISTICS: 

( A ) LENGTH: 460 amino acids 
( B ) TYPE: annuo acid 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: proton 

( z i ) SEQUENCE DESCRIPTION: SBQ ID NO30: 

Leo Ala Pro Ser lie Glo Val His Pro Gly Val Ala His Ser Ser Asn 
15 10 15 

Val lie Cys Ala Pro Pro Oly Sor Lys Ser lie Ser Asn Arg Ala Leo 
2 0 2 5 3 0 

Val Leo Ala Ala Leo Gly Ser Oly Thr Cys Arg lie Lys Asn Leo Leo 
35 40 45 

His Ser Asp Asp Thr Gin Val Met Leo Asn Ala Leo Olo Arg Leo Gly 
50 5 5 60 

Ala Ala Thr Phe Ser Trp GU Glo Olo Oly Olo Val Loo Val Val Asn 
65 70 75 SO 

Gly Lys Gly Gly Asn Leo Gin Ala Ser Ser Ser Pro Leo Tyr Leo Oly 
8 5 90 9 5 

Asn Ala Oly Thr Ala Ser Arg Phe Leo Thr Thr Val Ala Thr Leo Ala 
10 0 10 5 110 

Asn Ser Ser Thr Val Asp Ser Ser Val Leo Thr Oly Asn Asn Arg Met 
115 12 0 125 

Lys Gin Arg Pro lie Oly Asp Leo Val Asp Ala Leo Thr Ala Asn Val 
13 0 13 5 140 

Leo Pro Leo Asn Thr Scr Lys Oly Arg Ala Ser Leu Pro Leo Lys lie 
145 150 155 160 

Ala Ala Ser Oly Gly Phe Ala Oly Gly Asn lie Asn Leo Ala Ala Lys 
16 5 17 0 17 5 

Val Ser Scr Gin Tyr Val Ser Scr Lcn Leo Met Cys Ala Pro Tyr Ala 
180 18 5 .19 0 

Lys Glu Pro Val Thr Leu Arg Leo Val Gly Gly Lys Pro lie Ser Gin 
195 200 205 

Pro Tyr lie Asp Met Thr Thr Ala Met Met Arg Ser Phe Gly lie Asp 
2 10 2 15 2 2 0 

Val Gin Lys Ser Thr Thr Glo Olo His Thr Tyr His lie Pro Gin Gly 

225 230 235 ,240 

Arg Tyr Val Asn Pro Ala Glo Tyr Val lie Gin Ser Asp Ala Ser Cys 
245 250 255 

Ala Thr Tyr Pro Leo Ala Val Ala Ala Val Thr Gly Thr Thr Cys Thr 
260 265 270 

Val Pro Asn lie Gly Scr Ala Scr Leu Gin Oly Asp Ala Arg Phe Ala 
275 280 285 

Val Olo Val Leo Arg Pro Met Oly Cys Thr Val Olo Gin Thr Glo Thr 
290 295 300 



5,633,435 

111 112 

-continued 



Ser Thr T fa t Val Tbr Oly Pro Ser Asp Gly lie Leo Arg Ala Tfar Ser 
305 310 315 320 

Lyi Arg Gly Tyr Gly Thr A » n Asp Arg Cys Val Pro Arg Cys Phe Arg 

325 330 335 

Tfar Gly Ser His Arg Pro Met Gla Lys Ser Gin Thr Thr Pro Pro Val 
340 345 350 

Ser Ser Gly lie Ala Asn Gin Arg Val Lys Gin Cys Asn Arg lie Lys 
355 360 365 

Ala Met Lys Asp Gin Len Ala Lys Phe Oly Val lie Cys Arg Gin His 
370 375 380 

Asp Asp Gly Len Gin lie Asp Gly lie Asp Arg Ser Asn Len Arg Gin 
385 390 395 400 

Pro Val Gly Oly Val Phe Cys Tyr Asp Asp His Arg Val Ala Phe Ser 
405 4 10 4 15 

Phe Ser Val Len Ser Leo Val Thr Pro Gin Pro Thr Len lie Len Gin 
420 425 430 

Lys Gin Cys Val Gly Lys Thr Trp Pro Oly Tip Trp Asp Thr Len Arg 
43 5 440 44 5 

Gin Len Phe Lys Val Lys Len Gin Gly Lys Gin Len 
450 455 460 

( 2 ) INFORMATION FOR SEQ ID N031 : 

. ( i ) SEQUENCE CHARACTERISTICS: 

( A ) LENGTH: 444 ammo acids 
( B ) TYPE: amino acid 
( D ) TOPOLOGY: finear 

( i i ) MOLBCULE TYPE: potem 

( z i ) SEQUENCE DESCRIPTION: SEQ ID N031: 

Lys Ala Ser Gin lie Val Lea Gin Pro lie Arg Gin lie Ser Oly Len 
1 5 10 15 

lie Lys Len Pro Gly Ser Lys Ser Leo Ser Asn Arg lie Len Len Len 
2 0 2 5 3 0 

Ala Ala Len Ser Gin Gly Thr Thr Val Val Asp Asn Len Len Asn Ser 
3 5 40 45 

Asp Asp lie Asn Tyr Met Leu Asp Ala Len Lys Lys Leu Gly Len Asn 
5 0 5 5 6 0 

Val Gin Arg Asp Ser Val Asn Asn Arg Ala Val Val Gin Gly Cys Oly 
65 70 75 80 

Oly lie Phe Pro Ala Ser Len Asp Ser Lys Ser Asp lie Gin Len Tyr 
8 5 90 9 5 

Len Gly Asn Ala Gly Thr Ala Met Arg Pro Len Thr Ala Ala Val Thr 
10 0 10 5 110 

Ala Ala Gly Gly Asn Ala Ser Tyr Val Len Asp. Gly Val Pro Arg Met 
115 120 12 5 

Arg Ola Arg Pro lie Gly Asp Leo Val Val Oly Len Lys Gin Len Gly 
13 0 13 5 140 

Ala Asp Val Gin Cys Thr Len Oly Thr Asn Cys Pro Pro Val Arg Val 
145. 150 155 160 

Asn Ala Asn Gly Oly Len Pro Oly Gly Lys Val Lys Len Ser Gly Ser 
165 17 0 17 5 

lie Ser Ser Gin Tyr Len Thr Ala Leo Len Met Ala Ala Pro Len Ala 
1 8 0 1 8 5 1 90 

Len Gly Asp Val Gin lie Gin lie lie Asp Lys Leo lie Ser Val Pro 
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Tyr Val Glu Met Thr Len Lys Leu Met Olo Arg Phc Gly Val Sex Ala 
2 10 2 15 2 20 

Olo Bis Ser Asp Scr Trp Asp Arg Phc Pbo Val Lys Gly Gly Gin Lys 
225 230 23 5 240 

Tyr Lys Ser Pro Gly Asn Ala Tyr Val Glu Gly Asp Ala Ser Ser Ala 
2 45 2 50 25 5 

Ser Tyr Phe Leu Ala Gly Ala Ala lie Thr Gly Glu Thr Val Thr Val 
260 265 270 

Glu Gly Cys Gly Thr Thr Ser Leu Gin Gly Asp Val Lys Phe Ala Glu 
275 280 285 

Val Leu Olu Lys Met Oly Cys Lys Val Ser Trp Thr Glu Asa Ser Val 
290 295 300 

Thr Val Thr Gly Pro Ser Arg Asp Ala Phe Gly Met Arg His Leu Arg 
305 310 315 320 

Ala Val Asp Val Asn Met Asn Lys Met Pro Asp Val Ala Met Thr Leu 

325 330 335 

Ala Val Val Ala Leu Phe Ala Asp Gly Pro Thr Thr lie Arg Asp Val 
340 345 350 

Ala Ser Trp Arg Val Lys Glu Thr Glu Arg Met lie Ala lie Cys Thr 

355 360 365 

Glu Leu Arg Lys Leu Gly Ala Thr Val Olu Glu Oly Ser Asp Tyr Cys 
370 375 380 

Val lie Thr Pro Pro Ala Lys Val Lys Pro Ala Olu lie Asp Thr Tyr 
385 390 395 400 

Asp Asp His Arg Met Ala Met Ala Phe Ser Lou Ala Ala Cys Ala Asp 
405 410 415 

Val Pro Val Thr lie Lys Asp Pro Gly Cys Thr Arg Lys Thr Phe Pro 
420 425 430 

Asp Tyr Phe Gin Val Leu Glu Ser lie Thr Lys His 
435 440 

( 2 ) INFORMATION FOR SEQ ID NCh52: 

( i ) SEQUENCE CHARACTERISTICS: 

( A ) LENGTH; 444 annuo acids 
( B ) TYPE: amino arid 
( D ) TOPOLOGY: linear 

( i t ) MOLECULE TYPE: potrin 

( x i ) SEQUENCE DESCRIPTION: SEQ ID NCn52: 

Lys Ala Ser Olu lie Val Leu Gin Pro lie Arg Glu lie Ser Gly Leu 
15 10 15 

lie Lys Leu Pro Gly Ser Lys Ser Leu Ser Asn Arg lie Leu Leu Leo 
2 0 2 5 3 0 

Ala Ala Len Ser Olu Oly Thr Thr Val Val Asp Asn Leu Leu Asn Ser 
3 5 4 0 4 5 

Asp Asp lie Asn Tyr Met Leu Asp Ala Leu Lys Arg Leu Oly Leu Asn 
5 0 5 5 6 0 

Val Glu Thr Asp Ser Olu Asn Asn Arg Ala Val Val Glu Oly Cys Gly 
65 70 75 80 

Oly lie Phe Pro Ala Ser lie Asp Ser Lys Ser Asp lie Glu Leu Tyi 
85 90 95 

Leu Gly Asn Ala Oly Thr Ala Met Arg Pro Leu Thr Ala Ala Val Thr 
10 0 10 5 110 
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Ala Ala Oly Oly Asa Ala Ser Tyr Val Lea Asp Oly Val Pro Arg Met 
115 12 0 12 5 

Arg Glu Arg Pro lie Oly Asp Lou Val Val Oly Lea Lyi Ola Leu Oly 
13 0 13 5 140 

Ala Asp Val Ola Cys Tar Lea Oly Tar A s n Cys Pro Pro Val Arg Val 
145 150 155 160 

Asa Ala Asa Oly Gly Leu Pro Oly Oly Lys Val Lys Leu Ser Oly Set 
165 170 175 

lie Sor Ser Gin Tyr Leu Th r Ala Leu Leu Met Ser Ala Pro Leu Ala 
180 185 19 0 

Lea Oly Asp Val Glu lie Olu lie Val Asp Lys Leu lie Ser Val Pro 
195 200 205 

Tyr Val Olu Mot Thr Leu Lys Leu Met Glu Arg Phe Oly Val Ser Val 
2 10 2 15 2 20 

Olu His Ser Asp Ser Trp Asp Arg Pbe Phc Val Lys Oly Gly Ola Lys 
225 230 235 240 

Tyr Lys Ser Pro Oly Asn Ala Tyr Val Glu Oly Asp Ala Ser Ser Ala 
245 250 255 

Cys Tyr Pee Lea Ala Gly Ala Ala lie Thr Oly Olu Thr Val Thr Val 

260 265 270 

Glu Oly Cys Oly Thr Thr Ser Leu Ola Oly Asp Val Lys Phc Ala Glu 
275 280 285 

Val Leu Olu Lys Met Oly Cys Lys Val Ser Trp Thr Ola Asa Ser Val 
290 295 300 

Thr Val Thr Gly Pro Pro Arg Asp Ala Pbe Oly Met Arg His Leu Arg 

305 310 315 320 

Ala lie Asp Val Asa Met Asa Lys Met Pro Asp Val Ala Met Thr Leu 

325 330 335 

Ala Val Val Ala Leu Phc Ala Asp Gly Pro Thr Thr lie Arg Asp Val 
340 345 350 

Ala Ser Trp Arg Val Lys Olu Thr Olu Arg Met lie Ala lie Cys Thr 

355 360 365 

Olu Leu Arg Lys Leu Oly Ala Thr Val Glu Olu Oly Ser Asp Tyr Cys 
370 375 380 

Val lie Thr Pro Pro Lys Lys Val Lys Thr Ala Olu lie Asp Thr Tyr 
385 390 395 40 0 

Asp Asp His Arg Met Ala Met Ala Phe Ser Leu Ala Ala Cys Ala Asp 
405 410 415 

Val Pro lie Thr lie Asa Asp Ser Oly Cys Thr Arg Lys Thr Phe Pro 
420 425 430 

Asp Tyr Phe Gla Val Leu Glu Arg lie Thr Lys His 
435 440 



( 2 ) INFORMATION FOR SEQ ID N<*S3: 

( i ) SEQUENCE CHARACTERISTICS: 

( A ) LENGTH: 444 amino adds 
( B ) TYPE: ammo acid 
(D ) TOPOLOGY: Enear 

(it ) MOLECULE TYPE: proton 

( x i ) SEQUENCE DESCRIPTION: SEQ ID N033: 

Lys Pro Asa Olo lie Val Leu Ola Pro lie Lys Asp lie Ser Oly Thr 
1 5 10 15 

Val Lys Leu Pro Gly Ser Lys Sor Leo Ser Asa Arg lie Leu Leu Leu 
2 0 2 5 3 0 
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Ala Ala Leo Ser Lys Gly Arg Thr Val Val Asp Asa Las Loo Soi Set 

3 5 4 0 45 

Asp Asp lie His Tyr Met Leo Oly Ala Leo Lys Thr Leu Oly Leu His 

5 0 5 5 60 

Val Ola Asp Asp Asn Gin An Gin Arg Ala lie Val OU Gly Cys Gly 

6 5 7 0 7 5 8 0 

Gly Gin Phe Pro Val Gly Lys Lys Ser Olo Gin Glu lie Gin Leu Ph e 

8 5 9 0 9 5 

Leu Gly Asn Ala Gly Thr Ala Met Arg Pro Leu Thr Ala Ala Val Thr 

100 105 110 

Val Ala Oly Gly His Ser Arg Tyr Val Leu Asp Oly Val Pro Arg Met 

115 120 125 

Arg Olu Arg Pro lie Gly Asp Leu Val Asp Gly Leu Lys Gin Leu Gly 

13 0 13 5 14 0 

Ala Olu Val Asp Cys Phe Leo Oly Thr Asn Cys Pro Pro Val Arg lie 



145 



1 6 0 



Val Ser Lys Oly Gly Leu Pro Gly Gly Lys Val Lys Leu Ser Gly Ser 
16 5 170 175 

lie Ser Ser Gin Tyr Leu Thr Ala Leu Lou Met Ala Ala Pro Lea Ala 
18 0 18 5 19 0 

Leu Oly Asp Val Olu lie Glu lie lie Asp Lys Leu lie Ser Val Pro 
195 200 205 

Tyr Val Glu Met Thr Leu Lys Leu Met Glu Arg Phe Oly Val Ser Val 
210 215 220 

Olo His Thr Ser Ser Tip Asp Lys Phe Leu Val Arg Oly Oly Oln Lys 
225 230 23 5 240 

Tyr Lys Ser Pro Oly Lys Ala Tyr Val Glu Gly Asp Ala Ser Ser Ala 
245 250 255 

Ser Tyr Phe Leu Ala Gly Ala Ala Val Thr Gly Gly Thr Val Thr Val 
260 265 270 

Glu Gly Cys Oly Thr Ser Ser Leu Gin Oly Asp Val Lys Phe Ala Olo 
275 280 285 

Val Lou Gin Lys Met Gly Ala Glu Val Thr Trp Thr Olu Asn Ser Val 
290 295 3 00 

Thr Val Lys Oly Pro Pro Arg Asn Ser Ser Gly Met Lys His Leu Arg 

305 310 315 .320 

Ala Val Asp Val Asn Met Asn Lys Met Pro Asp Val Ala Met Thr Leu 

325 330 335 

Ala Val Val Ala Lea Phe Ala Asp Gly Pro Thr Ala lie Arg Asp Val 
3 40 3 4 5 3 5 0 

Ala Ser Trp Arg Val Lys Olu Thr Oln Arg Met lie Ala lie Cys Thr 

355 360 365 

Olu Leu Arg Lys Leu Gly Ala Thr Val Val Glu Oly Ser Asp Tyr Cys 
370 375 380 

lie lie Thr Pro Pro Glu Lys Leu Asn Val Thr Glu lie Asp Thr Tyr 
385 390 395 400 

Asp Asp His Arg Met Ala Met Ala Phe Ser Leu Ala Ala Cys Ala Asp 
405 410 415 

Val Pro Val Thr lie Lys Asp Pro Gly Cys Thr Arg Lys Thr Phe Pro 
420 425 430 

Asn Tyr Phe Asp Val Leu Oln Oln Tyr Ser Lys His 
435 440 
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( 2 ) INFORMATION FOR SBQ ID NOS* 

( i ) SEQUENCE CHARACTERISTICS: 

( A ) LENGTH: 444 ammo acids 
( B ) TYPE: amino acid 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: protein 

( x i ) SEQUENCE DESCRIPTION: SEQ ID N034c 

Lyi Pro His Olo lie Val Leu Xaa Pro lie Lys Atp lie Ser Oly Thr 
15 10 15 

Val Lys Leu Pro Gly Ser Lya Ser Leu Ser Am Arg lie Leu Leu Lou 

2 0 2 5 3 0 

Ala Ala Leu Ser Glu Oly Arg Thr Val Val Atp Am Leu Leu Ser Ser 
35 40 45 

Asp Asp lie His Tyr Met Leu Gly Ala Len Lys Thr Leu Gly Leu His 
5 0 5 5 60 

Val Glu Asp Asp Asa Glu Am Gin Arg Ala lie Val Glu Oly Cys Gly 
65 70 75 80 

Gly Gin Phe Pro Val Gly Lys Lys Ser Glu Glu Glu lie Gin Leu Phe 
8 5 90 95 

Leu Gly Asn Ala Gly Thr Ala Met Arg Pro Leu Thr Ala Ala Val Thr 
10 0 10 5 110 

Val Ala Oly Gly His Ser Arg Tyr Val Leu Asp Oly Val Pro Arg Met 
115 12 0 12 5 

Axg Glu Arg Pro lie Gly Asp Leu Val Asp Gly Leu Lys Gin Leu Oly 
13 0 13 5 14 0 

Ala Glu Val Asp Cys Ser Leu Oly Thr Asn Cys Pro Pro Val Arg lie 
145 150 155 160 

Val Ser Lys Oly Gly Leo Pro Gly Gly Lys Val Lys Leu Ser Oly Ser 
165 170 175. 

lie Ser Ser Gin Tyr Leo Thr Ala Leu Leu Met Ala Ala Pro Leu Ala 
180 185 19 0 

Leu Gly Asp Val Glu lie Glu lie lie Asp Lys Lou lie Ser Val Pro 
195 200 205 

Tyr Val Glu Met Thr Leu Lys Leu Met Glu Arg Phe Oly Val Phe Val 
2 10 2 15 2 2 0 

Glu His Ser Ser Gly Trp Asp Arg Phe Leu Val Lys Gly Gly Gin Lys 
225 230 235 240 

Tyr Lys Ser Pro Gly Lys Ala Phe Val Olu Oly Asp Ala Ser Ser Ala 
245 250 255 

Ser Tyr Phe Leu Ala Gly Ala Ala Val Thr Gly Oly Thr Val Thr Val 
260 265 270 

Glu Gly Cys Gly Thr Ser Ser Leu Gin Gly Asp Val Lys Phe Ala Olu 
275 280 285 

Val Leu Glu Lys Met Gly Ala Olu Val Thr Trp Thr Olu Asn Ser Val 
290 295 300 

Thr Val Lyi Gly Pro Pro Arg Asn Ser Ser Gly Met Lys His Leu Arg 
305 310 315 320 

Ala lie Asp Val Asn Met Asn Lys Met Pro Asp Val Ala Met Thr Leu 

325 330 335 

Ala Val Val Ala Leu Phe Ala Asp Gly Pro Thr Thr lie Arg Asp Val 

3 40 345 350 

Ala Ser Trp Arg Val Lys Olo Thr Glu Arg Met lie Ala lie Cys Thr 
355 360 365 
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GU Lou Arg Lp Leo Gly Ala Thr Va 1 Val Glo Gly Ser Asp Tyr Cyi 

370 375 380 

tie lie Thr Pro Pro Glu Lys Leu Am Val Thr Olu lie Asp Thr Tyr 
385 390 395 400 

Asp Asp His Arg Met Ala Met Ala Phe Sei Leo Ala Ala Cys Ala Asp 
405 410 4 15 

Val Pro Val Thr lie Lys Asa Pro Oly Cy i Thr Arg Lys Thr Phe Pro 
420 425 430 

Asp Tyr Phe Olu Val Leu O 1 n Lys Tyr Sor Lys His 
435 440 

( 2 ) INFORMATION FOR SBQ ID NCH55: 

( i ) SEQUENCE CHARACTERISTICS: 

( A ) LENGTH: 444 amino acids 
( B ) TYPE: amino acid 
(D)TOFOLOGY:fincar 

( i i ) MOLECULE TTF& protein 

( x i ) SEQUENCE DESCRIPTION: SBQ ID N055: 

Lys Pro Ser Olu lie Val Leu Oln Pro lie Lys Olu lie Ser Oly Thr 
13 10 15 

Val Lys Leu Pro Oly Ser Lys Ser Leu Ser Asn Arg lie Leu Leu Leu 
20 2 5 3 0 

Ala Ala Leu Ser Oln Oly Thr Thr Val Val Asp Asn Leu Leu Ser Ser 
3 5 4 0 45 

Asp Asp lie His Tyr Met Leu Oly Ala Leu Lys Thr Len Oly Leu His 
5 0 5 5 60 

Val Olu Olu Asp Ser Ala Asn Oln Arg Ala Val Val Oln Oly Cys Oly 
65 70 75 80 

Oly Leu Phe Pro Val Oly Lys Gin Ser Lys Olu Olu lie Oln Leu Phe 
8 5 90 9 5 

Len Oly Asn Ala Gly Thr Ala Met Arg Pro Leu Thr Ala Ala Val Thr 
10 0 10 5 110 

Val Ala Gly Gly Asn Ser Arg Tyr Val Leu Asp Oly Val Pro Arg Met 

115 120 125 

Arg Olu Arg Pro lie Ser Asp Leo Val Asp Oly Leu Lys Oln Leu Gly 
13 0 13 5 140 

Ala Gin Val Asp Cys Phe Leu Gly Thr Lys Cys Pro Pro Val Arg lie 
145 150 155 160 

Val Ser Lys Gly Gly Leu Pro Gly Gly Lys Val Lys Leu Ser Gly Ser 
16 5 170 17 5 

lie Ser Ser Oln Tyr Leu Thr Ala Len Leu Mot Ala Ala Pro Leo Ala 
180 18 5 19 0. 

Leu Oly Asp Val Olu lie Olu lie lie Asp Lys Leu lie Ser Val Pro 
195 200 205 

Tyr Val Glo Met Thr Leu Lys Leu Met Glu Arg Phe Oly lie Ser Val 
210 215 220 

Glu His Ser Ser Ser Trp Asp Arg Phe Phe Val Arg Gly Gly Gin Lys 
225 230 235 2 40 

Tyr Lys Ser Pro Oly Lys Ala Phe Val Glu Gly Asp Ala Ser Ser Ala 
245 250 255 

Ser Tyr Phe Leu Ala Gly Ala Ala Val Thr Oly Gly Thr lie Thr Val 
260 265 270 

Olu Gly Cys Gly Thr Asn Ser Leu Gin Gly Asp Val Lys Phe Ala Olu 
273 280 285 
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Val Lou Glu Lys Mot Gly Ala Glu Va 1 Tbr Tip Thr OU Asa Ser Val 
290 295 300 

Thr Val Lys Gly Pro Pro Arg Ser Ser Ser Gly Arg Lys His Leu Arg 
305 310 315 320 

Ala lie Asp Val Aid Met Asa Lys Met Pro Asp Val Ala Met Tbr Leu 

325 330 335 

Ala Val Val Ala Loo Tyr Ala Asp Gly Pro Thr Ala lie Arg Asp Val 
340 345 350 

Ala Ser Trp Arg Val Lys Glo Thr Gin Arg Met lie Ala lie Cys Thr 
355 360 365 

Glo Leo Arg Lys Lou Gly Ala Thr Val Glo Glo Gly Pro Asp Tyr Cys 
370 375 380 

lie lie Thr Pro Pro Glo Lys Leo Asn Val Thr Asp lie Asp Thr Tyr 
385 390 395 400 

Asp Asp Has Arg Met Ala Met Ala Phe Ser Leo Ala Ala Cys Ala Asp 
405 410 415 

Val Pro Val Thr lie Asn Asp Pro Gly Cys Thr Arg Lyi Thr Phe Pro 
420 425 430 

Asn Tyi Phe Asp Val Leo Oln Gin Tyr Ser Lys His 
435 440 

( 2 ) INFORMATION FDR SBQlT>r«)5& 

( i ) SEQUENCE CHARACTERISTICS: 

( A ) LENGTH: 444 amino adds 
( B ) TYPE: amino acid 
( D ) TOPOLOGY: finear 

( i a )M01£CULE TYPE: protein 

( x i ) SEQUENCE DESCRIPTION: SEQIDN056: 

Ala Gly Ala Glo Glo lie Val Leo Oln Pro lie Lys Glo lie Sor Gly 
1 5 10 15 

Thr Val Lys Leo Pro Gly Ser Lys Ser Leo Ser Asn Arg lie Leu Leo 
20 2 5 3 0 

Leo Ala Ala Leo Ser Glu Gly Thr Thr Val Val Asp Asn Leu Leu Asn 
3 5 40 45 

Ser Glu Asp Val His Tyr Met Leu Gly Ala Leu Arg Thr Leu Gly Leo 
5 0 5 5 60 

Ser Val Glo Ala Asp Lys Ala Ala Lys Arg Ala Val Val Val Gly Cys 
65 70 75 80 

Gly Gly Lys Phe Pro Val Glo Asp Ala Lys Glo Glu Val Gin Leu Phe 
8 5 9 0 9 5 

Leo Gly Asn Ala Gly Thr Ala Met Arg Pro Leu Thr Ala Ala Val Thr 
10 0 10 5 110 

Ala Ala Gly Gly Asn Ala Thr Tyr Val Leu Asp Oly Val Pro Arg Met 
115 120 125 

Arg Glo Arg Pro lie Gly Asp Leu Val Val Gly Leu Lys Gin Lou Gly 
13 0 13 5 140 

Ala Asp Val Asp Cys Phe Leo Gly Thr Asp Cys Pro Pro Val Arg Val 
145 150 155 160 

Asn Oly lie Gly Gly Leu Pro Gly Oly Lys Val Lys Leu Ser Oly Ser 
165 170 17 5 

lie. Ser Ser Gin Tyr Leu Ser Ala Leu Leu Met Ala Ala Pro Leu Pro 
18 0 18 5 190 

Leu Gly Asp Val Glu lie Glo lie lie Asp Lys Leu lie Ser lie Pro 
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Tyr VaI Glo Met Thr Lou Arg Loo Mot Gla Arg Pho Oly Val Lys Ala 
2 10 2 15 2 2 0 

Glo His Ser Asp Sex Trp Asp Arg Phe Tyr lie Lys Gly Oly Gin Lys 
22 5 230 235 240 

Tyr Lys Ser Pro Lys At a Ala Tyr Val Ola Oly Asp Ala Ser Ser Ala 
245 250 255 

Ser Tyr Phe Leu Ala Oly Ala Ala lie Thr Oly Gly Thr Val Thr Val 
260 265 270 

Glo Gly Cys Gly Thr Thr Ser Lea Gin Oly Asp Val Lys Pho Ala Gla 
275 2 80 285 

Val Loo Glo Met Met Gly Ala Lys Val Thr Trp Thr Glo Thr Ser Val 
290 295 300 

Thr Val Thr Oly Pro Pro Arg Glo Pro Phe Oly Arg Lys His Leu Lys 
305 310 315 320 

Ala lie Asp Val Am Met Asa Lys Met Pro Asp Val Ala Met Thr Leo 

3 25 330 335 

Ala Val Val Ala Leo Phe Ala Asp Gly Pro Thr Ala lie Arg Asp Val 
340 345 350 

Ala Ser Trp Arg Val Lys Glo Thr Glo Arg Met Val Ala lie Arg Thr 
355 360 365 

Glo Leo Thr Lys Loo Gly Ala Ser Val Glo Glo Gly Pro Asp Tyr Cys 

370 .3 75 380 

lie lie Thr Pro Pro Glo Lys Leu Asa Val Thr Ala lie Asp Thr Tyr 
385 390 395 400 

Asp Asp His Arg Met Ala Met Ala Phe Ser Leo Ala Ala Cys Ala Glo 
40 5 4 10 4 15 

Val Pro Val Thr lie Arg Asp Pro Gly Cys Thr Arg Lys Thr Phe Pro 
420 425 430 

Asp Tyr Phe Asp Val Leo Ser Thr Phe Val Lys Asa 
435 440 

( 2 ) INFORMATION FOIL SEQ ID N057: 

( i ) SEQUENCE CHARACTERISTICS: 

( A ) LENGTH: 427 amino acids 
( B ) TYPE; ammo acid 
( D ) TOPOLOGY: finear 

( i i ) MOLECULE TYPE: protein 

( x i ) SEQUENCE DESCRIPTION: SEQ ID N037: 

Met Glo Ser Leu Thr Leo Gin Pro lie Ala Arg Val Aip Oly Ala lie 
1 5 10 15 

Asa Leo Pro Oly Ser Lys Ser Val Ser Asa Arg Ala Leo Leo Leo Ala 
2 0 2 5 3 0 

Ala Leo Ala Cys Gly Lys Thr Val Leo Thr Asa Leo Leo Asp Ser Asp 
3 5 40 45 

Asp Val Arg His Met Leo Asa Ala Leo Ser Ala Leo Gly lie Asa Tyr 
5 0 5 5 6 0 

Thr Leo Ser Ala Asp Arg Thr Arg Cys Asp lie Thr Gly Ait Gly Gly 
65 70 75 80 

Pro Leu Arg Ala Pro Gly Ala Leu Glo Leo Phe Leu Oly Asa Ala Gly 
85 90 95 

Thr Ala Met Arg Pro Leo Ala Ala Ala Leo Cys Leo Gly Gla Asn Glo 
10 0 10 5 110 
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lie Val Leo Tor Oly Ola Pro Arg Met Lys Gin Arg Pro lie Gly Hit 
115 120 125 

Leo Val Aop Ser Leo Arg Gin Gly Gly Ala Am Me Asp Tyr Leo Olo 
1 3 0 1 3 5 .1 40 

Ol n Olo A*n Tyr Pro Pro Leo Arg Leo Arg Gly Oly Phe lie Gly Gly 
145 150 155 160 

Asp lie Ola Val Asp Gly Ser Val Sor Ser Gin Phe Len Thr Ala Leo 
16 5 170 17 5 

Leo Met Thr Ala Pro Leo Ala Pro Lys Asp Thr lie lie Arg Val Lys 
180 18 5 190 

Gly Gin Len Val Ser Lys Pro Tyr lie Asp lie Thr Len Asn Len Met 
195 200 205 

Lys Thr Phe Gly Val Olo lie Ala Asn Bis Bis Tyr Gin Gin Phe Val 
210 213 220 

Val Lys Gly Gly Gin Gin Tyr Bis Ser Pro Gly Arg Tyr Leo Val Glo 
225 230 235 240 

Oly Asp Ala Ser Ser Ala Ser Tyr Phe Leo Ala Ala Gly Ala lie Lys 
245 250 255 

Gly Gly Thr Val Lys Val Thr Oly lie Oly Arg Lys Ser Met Oln Gly 
260 265 270 

Asp lie Arg Phe Ala Asp Val Len Gin Lys Met Oly Ala Thr lie Thr 
275 280 285 

Trp Oly Asp Asp Phe lie Ala Cys Thr Arg Gly Gin Len Bis Ala lie 
290 295 300 

Asp Met Asp Met Asn Bis lie Pro Asp Ala Ala Met Thr lie Ala Thr 
305 3 10 315 320 

Thr Ala Len Phe Ala Lys Oly Thr Thr Thr Leo Arg Asn lie Tyr Asn 

325 330 335 

Trp Arg Val Lys Oln Thr Asp Arg Leo Phe Ala Met Ala Thr Oln Len 
3 40 3 45 3 50 

Arg Lys Val Gly Ala Oln Val Oln Oln Gly Bis Asp Tyr lie Arg lie 
355 360 365 

Thr Pro Pro Ala Lys Len Gin Bis Ala Asp lie Gly Thr Tyr Asn Asp 

370 375 380 

Bis Arg Met Ala Met Cys Phe Ser Len Val Ala Leo Ser Asp Thr Pro 
385 390 395 400 

Val Thr lie Leo Asp Pro Lys Cys Thr Ala Lys Thr Phe Pro Asp Tyr 
40 5 410 4 15 

Phe Oln Gin Leo Ala Arg Met Ser Thr Pro Ala 

. 420 425 



( 2 ) INFORMATION FOR SBQIDN058: 

( i ) SEQUENCE CHARACTERISTICS: 

( A ) LENGTH: 427 amino acids 
( B ) TYPE? amino acid 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: protein 

( x i ) SEQUENCE DESCRIPTION: SEQ ID NCh58: 

Met Olo Ser Leo Thr Leo Gin Pro lie Ala Arg Val Asp Gly Ala lie 
1 5 10 15 

Asn Leo Pro Oly Ser Lys Ser Val Ser Asn Arg Ala Leo Leo Leo Ala 
20 25 30 



Ala Leo Ala Cys Gly Lys Thr Val Leo Thr Asn Leo Leo Asp Ser Asp 
35 40 45 
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Asp Val Arg Hit Met Leo Am Ala Leo Set Ala Leo Gly lie Aid Tyi 
5 0 5 5 6 0 

Thr Leo Ser Ala Asp Arg Thr Arg Cys Asp lie Thr Oly Asn Oly Oly 
65 70 75 SO 

Pro Leo Arg Ala Ser Oly Thr Leo Olo Leo Pbe Leo Oly Asa Ala Oly 
85 90 95 

Thr Ala Met Arg Pro Leo Ala Ala Ala Leo Cys Leo Gly Ola Asn Olo 
10 0 10 5 110 

lie Val Leo Thr Oly Olo Pro Arg Met Lys Olo Arg Pro lie Oly His 
115 12 0 12 5 

Leo Val Asp Ser Leo Arg Ola Oly Oly Ala Asn lie Asp Tyr Leo Olo 
13 0 13 5 140 

Oln Olo Asn Tyr Pro Pro Leo Arg Leo Arg Oly Oly Phe lie Oly Oly 
145 150 155 160 

Asp lie Olo Val Asp Oly Ser Val Ser Ser Oln Phe Leo Thr Ala Leo 
165 17 0 175 

Leo Met Thr Ala Pro Leo Ala Pro Olo Asp Thr lie lie Arg Val Lys 
180 18 5 190 

Gly Olo Leo Val Ser Lys Pro Tyr lie Asp lie Thr Leo Asn Leo Met 
195 200 205 

Lys Thr Phe Oly Val Olo lie Ala Asn His His Tyr Oln Gin Phe Val 
210 215 220 

Val Lys Oly Oly Oln Oln Tyr His Ser Pro Gly Arg Tyr Leo Val Olo 
225 230 235 240 

Oly Asp Ala Ser Ser Ala Ser Tyr Phe Leo Ala Ala Oly Oly lie Lys 
245 250 255 

Oly Gly Thr Val Lys Val Thr Oly lie Gly Gly Lyi Ser Met Gin Oly 
260 265 270 

Asp lie Arg Phe Ala Asp Val Leo His Lys Met Oly Ala Thr lie Thr 
275 280 285 

Trp Oly Asp Asp Phe lie Ala Cys Thr Arg Oly Olo Leo His Ala lie 
290 295 300 

Asp Met Asp Met Asn His lie Pro Asp Ala Ala Met Thr lie Ala Thr 
305 310 315 320 

Thr Ala Leo Phe Ala Lys Oly Thr Thr Thr Leo Arg Asn lie Tyr Asn 

325 330 335 

Trp Arg Val Lys Glo Thr Asp Arg Leo Phe Ala Mot Ala Thr Olo Leo 
340 3 45 350 

Arg Lys Val Gly Ala Olo Val Olo Glo Oly His Asp Tyr lie Arg lie 

355 360 365 

Thr Pro Pro Ala Lys Leo Oln His Ala Asp lie Gly Thr Tyr Asn Asp 
370 375 380 

His Arg Met Ala Met Cys Phe Ser Leo Val Ala Leo Ser Asp Thr Pro 
385 390 395 400 

Val Thr lie Leo Asp Pro Lys Cys Thr Ala Lys Thr Phe Pro Asp Tyr 
4 0 5 4 10 4 15 

Phe Gin Oln Leo Ala Arg Met Sor Thr Pro Ala 
420 425 

( 2 ) INFORMATION FOR SEQ ID ND59: 

( i : 



( A ) LENGTH: 427 ammo acids 
( B ) TYPE: amino ackt 
( D ) TOPOLOGY: fines 
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( i i 
(x i 

Me 

1 



Va 
6 S 



Th 



I 1 



Le 



Gl 
145 



As 



O 1 



Va 

2 2 



Ol 



Ol 



A s 
3 0 



A 1 



Tr 



Th 



Hi 
3 8 



MOLECULE TYPE: proton 

SEQUENCE DESCRIPTION: SEQID NCh59: 

Gin Ser Lou Tbr Lou Ola Pro lie Ala Ar g Val Asp Gly Th r Val 
5 10 15 

Leo Pro Gly Ser Lys Ser Val Ser Atn Arg Ala Lea Leo Leo Ala 

2 0 2 5 3 0 

Leo Ala Arg Gly Thr Thr Val Lea Th r A s n Leo Leo Aip Ser A*p 
35 40 45 

Val Arg Hi* Met Leo Asa Ala Leo Ser Ala Leo Gly Val His Tyr 

5 0 5 5 6 0 

Leo Ser Ser Asp Arg Thr Arg Cys Olo Val Thr Gly Thr Gly Gly 
70 75 80 

Leo Gin Ala Gly Ser Ala Len Gin Len Phe Len Gly Aso Ala Gly 
8 5 90 95 

Ala Met Arg Pro Leo Ala Ala Ala Leo Cys Leo Gly Ser Asn Asp 
10 0 10 5 110 

Val Leo Thr Gly Glo Pro Arg Met Lyi Glo Arg Pro lie Gly His 
115 12 0 12 5 

Val Asp Ala Leo Arg Gin Gly Gly Ala Gin lie Asp Tyr Leo Glo 

130 135 140 

Glo Asn Tyr Pro Pro Len Arg Leo Arg Gly Gly Pbe Thr Gly Gly 

15 0 15 5 160 

Val Glo Val Asp Gly Ser Val Ser Ser Gin Phe Len Thr Ala Len 
16 5 17 0 17 5 

Met Ala Ser Pro Leo Ala Pro Gin Asp Thr Val lie Ala lie Lys 
18 0 18 5 19 0 

Glo Leo Val Ser Arg Pro Tyr lie Asp lie Thr Loo His Leu Met 
195 200 205 

Thr Phe. Gly Val Olo Val Glo Asn Gin Ala Tyr Gin Arg Phe lie 

210 215 220 

Arg Gly Asn Gin Ola Tyr Oln Ser Pro Gly Asp Tyr Leo Val Gin 

230 235 240 

Asp Ala Ser Ser Ala Ser Tyr Phe Leo Ala Ala Gly Ala lie Lys 
245 250 255 

Gly Thr Val Lys Val Thr Gly lie Gly Arg Asn Ser Val Gin Gly 
260 265 270 

lie Arg Phe Ala Asp Val Leo Glo Lys Met Gly Ala Thr Val Thr 
275 280 285 

Gly Glo Asp Tyr lie Ala Cys Thr Arg Gly Gin Lea Asn Ala lie 

290 295 300 

Met Asp Met Asn His lie Pro Aip Ala Ala Met Thr lie Ala Tbr 

310 315 320 

Ala Leo Phe Ala Arg Gly Thr Thr Thr Leo Arg Aso lie Tyr Asa 

325 330 335 

Arg Val Lys Glo Thr Asp Arg Leo Phe Ala Met Ala Thr Glu Len 
340 345 350 

Lys Val Gly Ala Olo Val Gin Olo Gly Olo Asp Tyr lie Arg lie 
355 360 365 

Pro Pro Leo Thr Leo Gin Phe Ala Olo lie Oly Thr Tyr Aso Asp 

370 375 380 

Arg Met Ala Met Cys Phe Ser Leo Val Ala Len Ser Asp Thr Pro 

390 395 400 
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Val Thr lie Leu Asp Pro Lyi Cy » Thr Ala Lyi Tor Phc Pro Asp Tyr 
405 410 415 

Pbe Gly Gin Let Ala Arg lie Ser Thr Leu Ala 
420 425 

( 2 ) INFORMATION FOR SEQ ID NO60: 

( i ) SEQUENCE CHARACTERISTICS: 

( A ) LENGTH: 427 amino adds 
( D ) TYPE: amino arid 
( D ) TOPOLOGY: fines 

( i i ) MOLECULE TYPE: protein 

( x i ) SEQUENCE DESCRIPTION: SEQ ID NOrfO: 

Met Leo Ola Ser Leu Thr Len His Pro lie Ala Lea Ilo Asn Gly Thr 
t 5 10 15 

Val Asn Lea Pro Gly Ser Lys Ser Val Ser Asn Arg Ala Lea Leo Len 
20 25 30 

Ala Ala Len Ala Gin Gly Thr Thr Gin Lea Asn Asa Leo Lea Asp Ser 
3 5 40 4 5 

Asp Asp lie Arg His Met Lea Asa Ala Lea Gla Ala Leo Gly Val Lys 
50 55 60 

Tyr Arg Lea Ser Ala Asp Arg Thr Arg Cys Ola Val Asp Gly Leo Gly 
65 70 75 g0 

Gly Lys Lea Val Ala Ola Gin Pro Lea Olo Leo Phe Leo Gly Asa Ala 
8 5 90 9 5 

Gly Thr Ala Met Arg Pro Lea Ala Ala Ala Leo Cys Leo Gly Lys Asn 
1 0 0 1 05 l i o 

Asp lie Val Lea Thr Oly Gla Pro Arg Met Lys Olo Arg Pro lie Gly 
115 120 125 

His Lea Val Asp Ala Lea Arg Ola Gly Oly Ala Ola lie Asp Tyr Leo 
13 0 13 3 14 0 

Olo Gla Ola Asn Tyr Arg Arg Cys lie Ala Gly Gly Phe Arg Gly Gly 
145 150 155 160 

Lys Lea Thr Val Asp Oly Ser Val Sor Ser Ola Phe Leo Thr Ala Lea 
16 5 170 175 

Leo Met Thr Ala Pro Lea Ala Olo Oln Asp Thr Gin lie Gin lie Gin 
180 185 190 

Gly Ola Lea Val Ser Lys Pro Tyr lie Asp lie Thr Leo His Len Met 
195 200 205 

Lys Ala Phe Oly Val Asp Val Val His Gla Asn Tyr Oln lie Phe His 

2 10 2 15 2 2 0 

lie Lys Oly Gly Gin Thr Tyr Arg Ser Pro Oly lie Tyr Len Val Gla 
225 230 235 240 

Gly Asp Ala Ser Ser Ala Ser Tyr Phe Leo Ala Ala Ala Ala lie Lys 
245 250 255 

Gly Oly Thr Val Arg Val Thr Oly lie Oly Lys Ola Ser Val Gin Gly 
260 265 270 

Asp Thr Lys Phe Ala Asp Val Leo Gla Lys Met Oly Ala Lys lie Ser 
275 280 285 

Tip Oly Asp Asp Tyr lie Ola Cys Ser Arg Gly Oln Leo Oln Gly lie 
290 295 300 

Asp Met Asp Met Asa His lie Pro Asp Ala Ala Met Thr lie Ala Thr 
305 310 315 320 

Thr Ala Len Phc Ala Asp Gly Pro Thr Val lie Arg Asn lie Tyr Asn 

325 330 335 
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Trp At g Val Lyi Ola Thr Asp Arg Leu Ser Ala Met Ala Thr Gin Leu 
340 345 350 

Arg Lys Val Oly Ala Glo Val Olo Gin Gly Gin Asp Tyr lie Arg Val 

355 360 365 

Val Pro Pro Ala Gin Leo lie Ala Ala Gin lie Gly Thr Tyr Am Asp 
370 375 380 

Hi i Arg Met Ala Met Cys Phe Ser Leo Val Ala Len Ser Asp Thr Pro 
385 390 395 400 

Val Thr lie Len Asp Pro Lyi Cys Thr Ala Lys Thr Phe Pro A»p Tyr 
405 410 415 

Phe Gin Gin Len Ala Arg Len Ser Gin lie Ala 
42 0 42 5 

( 2 ) INFORMATION FOR SEQ ID NCh6l: 

( i ) SEQUENCE CHARACTERISTICS: 

( A ) LENGTH: 432 ammo acids 
(B ) TYPE: amino acid 
( D ) TOPOLOGY: finear 

( i i ) MOLECULE TYPE: proton 

( x i ) SEQUENCE DESCRIPTION: SEQ ID NOs61: 

Met Oln Lys lie Thr Len Ala Pro lie Ser Ala Val Gin Gly Thr lie 
1 5 10 15 

Asn Len Pro Gly Ser Lys Ser Len Ser Asn Arg Ala Len Len Len Ala 
20 2 5 3 0 

Ala Len Ala Lys Gly Thr Thr Lys Val Thr Asn Len Len Asp Ser Asp 
3 5 4 0 45 

Asp lie Arg His Met Len Asn Ala Leo Lys Ala Len Gly Val Arg Tyr 
5 0 5 5 6 0 

Gin Leo Ser Asp Asp Lys Thr lie Cys Gin lie Glo Gly Lea Gly Gly 
65 70 75 80 

Ala Phe Asn lie Oln Asp Asn Leo Ser Leo Phe Leo Gly Asn Ala Gly 

8 5 90 9 5 

Thr Ala Met Arg Pro Leo Thr Ala Ala Leo Cys Leo Lys Gly Asn His 
10 0 10 5 110 

Gin Val Glo lie lie Leo Thr Gly Olo Pro Arg Met Lys Olo Arg Pro 
115 120 12 5 

lie Leo His Len Val Asp Ala Len Arg Gin Ala Gly Ala Asp lie Arg 
130 135 140 

Tyr Len Gin Asn Glo Gly Tyr Pro Pro Leo Ala lie Arg Asn Lys Gly 
145 150 155 16 0 

lie Lys Gly Gly Lys Val Lys lie Asp Gly Ser lie Ser Ser Gin Phe 
16 5 170 17 5 

Len Thr Ala Len Leu Met Ser Ala Pro Len Ala Gin Asn Asp Thr Gin 
180 185 190 

lie Glo lie lie Oly Olo Len Val Ser Lys Pro Tyr lie Asp lie Thr 
195 200 205 

Len Ala Met Met Arg Asp Phe Gly Val Lys Val Gin Asn His His Tyr 
2 10 2 15 220 

Gin Lys Phe Oln Val Lys Gly Asn Gin Ser Tyr lie Ser Pro Asn Lys 
225 230 235 2 40 

Tyr Len Val Gin Oly Asp Ala Ser Ser Ala Ser Tyr Phe Leo Ala Ala 
245 250 255 

Gly Ala lie Lys Oly Lys Val Lys Val Thr Gly lie Gly Lys Asn Ser 
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260 265 270 

lie Gin Oly Asp Arg Loo Phe Ala Asp Val Leo Glo Lyi Met Oly Ala 
275 280 285 

Lys lie Thr Trp Oly Glo Asp Phe lie Gin Ala Glo His Ala Glo Leo 
290 295 300 

Asn Gly lie Asp Met Asp Met Asa His lie Pro Asp Ala Ala Met Thr 
305 310 315 320 

lie Ala Thr Thr Ala Leo Phe Ser Asn Gly Glo Thr Val lie Arg Asn 

325 330 335 

lie Tyr Asn Trp Arg Val Lys Glo Thr Asp Arg Leo Thr Ala Met Ala 
340 345 350 

Thr Glo Leo Arg Lys Val Gly Ala Glo Val Glo Glo Oly Glo Asp Phe 

355 360 365 

lie Arg lie Ols Pro Leo Ala Leo Asn Gin Phe Lys His Ala Asn lie 
370 375 380 

Glo Thr Tyr Asn Asp His Arg Met Ala Met Cys Phe Ser Leo lie Ala 
385 390 395 400 

Leo Ser Asn Thr Pro Val Thr lie Leo Asp Pro Lys Cys Thr Ala Lys 
405 410 415 

Thr Phe Pro Thr Phe Phe Asn Glo Phe Glo Lys lie Cys Leo Lys Asn 
420 425 430 

( 2 ) INFORMATION FOR SBQ ID N062: 

( i ) SEQUENCE CHARACTERISTICS: 

( A ) LENGTH; 441 amino acids 
( B ) TYPE: ammo acid 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: protein 

( x i ) SEQUENCE DESCRIPTION: SBQ ID NCh62: 

Val lie Lys Asp Ala Thr Ala lie Thr Lea Asn Pro lie Ser Tyr lie 
1 5 10 15 

Glo Gly Glo Val Arg Leo Pro Gly Ser Lys Ser Leo Ser Asn Arg Ala 
2 0 2 5 3 0 

Loo Loo Leo Ser Ala Leo Ala Lys Gly Lys Thr Thr Leo Thr Asn Leo 
35 40 45 

Leo Asp Ser Asp Asp Val Arg His Met Leo Asn Ala Leo Lys Glo Leo 
50 55 60 

Gly Val Thr Tyr Gin Leo Ser Glo Asp Lys Ser Val Cys Gin lie Glo 
65 70 75 80 

Oly Leo Gly Arg Ala Phe Glo Trp Gin Ser Gly Leo Ala Leo Phe Leo 
8 5 9 0 9 5 

Gly Asn Ala Gly Thr Ala Met Arg Pro Leo Thr Ala Ala Leo Cys Leo 
10 0 10 5 110 

Ser Thr Pro Asn Arg Ola Oly Lys Asn Glo lie Val Loo Thr Gly Glo 
115 12 0 12 5 

Pro Arg Met Lys Glo Arg Pro lie Gin His Leo Val Asp Ala Leo Cys 
13 0 13 5 140 

Gin Ala Gly Ala Glo lie Gin Tyr Leo Glo Gin Glo Gly Tyr Pro Pro 
145 150 155 160 

lie Ala lie Arg Asn Thr Gly Leo Lys Oly Gly Arg lie Gin lie Asp 
165 17 0 17 5 

Gly Ser Val Ser Ser Gin Phe Leo Thr Ala Leo Leo Met Ala Ala Pro 
18 0 185 19 0 
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Met Ala Gin Ala Asp Thr Olu lie Olu lie lie Oly Olo Leo Va 1 Ser 
195 200 205 

Ly» Pro Tyr lie Asp lie Tbr Leu Lys Met Met Oln Thr Phe Oly Val 
2 t 0 2 1 5 22 0 

Olo Val Olo Aso Oln Ala Tyr Oln Arg Phe Leo Val Lys Oly Hit Ola 

225 230 235 2 40 

Oln Tyr Oln Ser Pro His Arg Phe Leo Val Olo Oly Asp Ala Ser Ser 
2 45 25 0 2 5 5 

Ala Ser Tyr Phe Leo Ala Ala Ala Ala lie Lys Oly Lys Val Lys Val 
260 265 270 

Thr Oly Val Oly Lys Asa Ser lie Oln Oly Asp Arg Leo Phe Ala Asp 
275 280 285 

Val Leo Olo Lys Met Oly Ala His lie Thr Trp Oly Asp Asp Phe lie 
290 295 300 

Oln Val Olo Lys Oly Asn Leo Lys Oly lie Asp Mot Asp Met Aso His 
305 310 315 320 

(le Pro Asp Ala Ala Met Thr lie Ala Thr Thr Ala Leo Phe Ala Olo 

325 330 335 

Gly Olo Thr Val lie Arg Asn lie Tyr Asn Trp Arg Val Lys Glo Tbr 
340 345 350 

Asp Arg Leo Thr Ala Met Ala Thr Ola Leo Arg Lys Val Oly Ala Olo 

355 360 365 

Val Olo Glo Oly Olo Asp Phe lie Arg lie Oln Pro Leo Asa Leo Ala 
370 375 380 

Oln Phe Gin His Ala Olo Leo Asn lie His Asp Hi i Arg Met Ala Met 
385 390 395 400 

Cys Phe Ala Leo lie Ala Leo Ser Lys Thr Ser Val Thr lie Leo Asp 
405 410 415 

Pro Ser Cys Thr Ala Lys Thr Phe Pro Thr Phe Leo lie Leo Phe Thr 
420 425 430 

Leo Asn Tbr Arg Olo Val Ala Tyr Arg 
435 440 

( 2 ) INFORMATION KBtSBQ TO N043: 

( i ) SEQUENCE CHARACTERISTICS: 

( A ) LENGTH: 426 amino adds 

( B ) i tthi*? amino 

( D ) TOPOLOGY: fines 

( i i ) MOLBCUl£ TYPE; protein 

( x i ) SEQUENCE DESCRIPTION: SEQID NCh63: 

Asn Ser Leo Arg Lea Oln Pro lie Ser Arg Val Ala Gly Gin Val Asn 
15 10 15 

Leo Pro Gly Ser Lys Ser Val Ser Asn Arg Ala Leo Leo Leu Ala Ala 

20 2 5 3 0 

Leo Ala Arg Gly Thr Thr Arg Leo Thr Asn Leo Leo Asp Ser Asp Asp 
3 5 4 0 4 5 

lie Arg His Met Leo Ala Ala Leo Tbr Gin Leo Gly Val Lys Tyr Lys 
5 0 5 5 6 0 

Leo Ser Ala Asp Lys Tbr Glo Cys Tbr Val His Gly Leo Gly Arg Ser 
65 70 75 80 

Phe Ala Val Ser Ala Pro Val Asn Leo Phe Leo Gly Asn Ala Oly Thr 
85 90 95 

Ala Met Arg Pro Leo Cyi Ala Ala Leo Cys Leo Oly Ser Oly Glo Tyr 
10 0 10 5 110 
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Met Leo Oly Gly OU Pro Arg Met Olo Glo Arg Pro lie Gly His Leu 
115 12 0 12 5 

Val Asp Cys Leo Ala Leo Ly* Oly Ala Hit lie OU Tyr Leo Lys Lys 
13 0 13 5 140 

Asp Oly Tyr Pro Pro Leo Val Val Asp Ala Ly$ Oly Leo Tip Oly Gly 
145 150 155 160 

Asp Val His Val Asp Oly Ser Val Sei Ser Ola Phe Leo Thr Ala Phe 
165 170 175 

Leo Met Ala Ala Pro Ala Met Ala Pro Val lie Pro Arg lie His lie 
180 185 190 

Lys Oly Olo Leo Val Ser Lyi Pro Tyr lie Asp lie Thr Leo His lie 
195 200 205 

Met Asa Ser Ser Oly Val Val lie Olo Hit Asp Asa Tyr Lys Leo Phe 
210 215 220 

Tyr lie Lys Oly Asa Ola Ser lie Val Ser Pro Oly Asp Phe Leo Val 
225 230 235 240 

Olo Oly Asp Ala Ser Ser Ala Ser Tyr Phe Leo Ala Ala Oly Ala lie 
245 250 255 

Lyt Oly Lys Val Arg Val Thr Oly lie Oly Lys His Ser 11c Gly Asp 
260 265 270 

lie His Phe Ala Atp Val Leo Olo Arg Met Gly Ala Arg lie Thr Trp 
275 280 285 

Oly Asp Asp Phe lie Glo Ala Olo Glo Oly Pro Leu Hit Oly Val Asp 
290 295 300 

Met Asp Met Asa His lie Pro Asp Val Oly His Asp Hit Ser Oly Ola 
305 310 315 320 

Ser His Cys Leo Pro Arg Val Pro Pro Hit Ser Gin Hit Leo Gin Leo 
325 330 335 

Ala Val Arg Asp Atp Arg Cyt Thr Pro Cyt Thr His Oly His Arg Arg 

340 345 350 

Ala Oln Ala Oly Val Ser Glo Glo Oly Thr Thr Phe lie Thr Arg Atp 

355 360 365 

Ala Ala Asp Pro Ala Gla Ala Arg Arg Asp Arg Hit Leo Gla Arg Ser 
370 375 380 

Arg lie Ala Met Cys Phe Ser Leo Val Ala Leo Ser Asp lie Ala Val 
385 390 395 400 

Thr lie Asn Asp Pro Gly Cys Thr Ser Lys Thr Phe Pro Asp Tyr Phe 
405 410 415 

Asp Lys Len Ala Ser Val Ser Gin Ala Val 
420 425 

( 2 ) INFORMATION FOR SRQ ID NCh64: 

( i ) SEQUENCE CHARACTERISTICS: 

( A ) LENGTH: 442 amino adds 
( B ) TYPE: annuo bbA 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: protein 

( x i ) SEQUENCE DESC3UPHON: SBQ ID NO**: 

Met Ser Gly Leo Ala Tyr Len Atp Leo Pro Ala Ala Arg Leo Ala Arg 
1 5 10 15 

Gly Glo Val Ala Leo Pro Oly Ser Lys Ser lie Ser Asn Arg Val Leu 
20 2 5 3 0 

Leo Leo Ala Ala Loo Ala Glo Gly Ser Thr Olo He Thr Oly Leo Leo 
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35 40 45 

Asp Ser Asp Aap Thr Arg Val Met Lea Ala Ala Leo Arg Gin Leu Gly 
50 5 5 60 

Val Ser Val Oly Gin Val Ala Aap Gly Cys Val Thr lie Gin Gly Val 
65 70 75 80 

Ala Aig Phc Pro Tor Gin Gin Ala Gin Len Phc Leo Gly Asn Ala Gly 
85 90 95 

Thr Ala Phc Arg Pro Len Thr Ala Ala Len Ala Len Met Gly Gly Aip 
10 0 105 110 

Tyr Arg Leu Ser Gly Val Pro Arg Met Hia Gin Arg Pro lie Gly Asp 
115 12 0 125 

Len Val Asp Ala Len Arg Gin Phe Gly Ala Gly lie Gin Tyr Len Gly 
13 0 13 5 140 

Gin Ala Gly Tyr Pro Pro Len Arg lie Oly Oly Gly Ser lie Arg Val 
145 150 155 160 

Asp Gly Pro Val Arg Val Ola Gly Ser Val Ser Ser Oln Phe Len Thr 
16 5 17 0 175 

Ala Len Leo Met Ala Ala Pro Val Len Ala Arg Arg Ser Oly Oln Asp 
18 0 18 5 19 0 

lie Thr lie Oln Val Val Oly Oln Len lie Ser Lya Pro Tyr lie Glo 
195 200 205 

lie Thr Leo Asn Leo Met Ala Arg Phe Oly Val Ser Val Arg Arg Asp 
2 10 2 15 2 2 0 

Oly Trp Arg Ala Phe Thr lie Ala Arg Aap Ala Val Tyr Arg Oly Pro 
225 230 235 240 

Oly Arg Met Ala lie Gin Gly Asp Ala Ser Thr Ala Ser Tyr Phe Leu 
245 250 255 

Ala Leo Oly Ala lie Gly Gly Gly Pro Val Arg Val Thr Oly Val Oly 
260 265 270 

Gin Asp Ser lie Gin Gly Asp Val Ala Phe Ala Ala Thr Len Ala Ala 
275 280 285 

Met Gly Ala Asp Val Arg Tyr Gly Pro Oly Trp lie Oln Thr Arg Oly 
290 295 300 

Val Arg Val Ala Olo Gly Gly Arg Len Lys Ala Phe Asp Ala Asp Phe 

305 310 315 320 

Asn Leo lie Pro Asp Ala Ala Met Thr Ala Ala Thr Leo Ala Leo Tyr 

325 330 335 

Ala Asp Gly Pro Cys Arg Leo Arg Asn lie Gly Ser Trp Arg Val Lya 
340 3 45 350 

Gin Thr Asp Arg lie His Ala Met His Thr Oln Leo Olo Lys Leo Oly 

355 360 365 

Ala Oly Val Oln Ser Gly Ala Asp Trp Leo Glo Val Ala Pro Pro Olo 
370 375 380 

Pro Gly Gly Trp Arg Asp Ala His lie Oly Thr Trp Asp Asp His Arg 
385 390 395 400 

Met Ala Met Cys Phe Leo Leo Ala Ala Phe Oly Pro Ala Ala Val Arg 
40 5 4 10 4 15 

lie Leo Aap Pro Oly Cya Val Ser Lys Thr Phe Pro Asp Tyr Phe Asp 
420 425 430 

Val Tyr Ala Oly Leo Len Ala Ala Arg Asp 
43 5 440 



( 2 ) INFORMATION FOR SBQ ID N065: 
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( i : 

( A ) LENGTH: 427 amino adds 
( B ) TYPE: amino arid 
(D)TO«5LOGY:Hnesr 

< i i ) MOLECULE TYPE: proton 

( X i ) SEQUENCE DESCRIPTION: SEQ ID NCfc65: 

Mot OU Ser Lou Thr Leo OU Pro lie Ala Aig Val Asp Oly Ala lie 
15 10 15 

Asa Leo Pio Gly Ser Lys Ser Val Ser Aan Arg Ala Leo Leo Leo Ala 
2 0 2 5 3 0 

Ala Leo Ala Cys Oly Lys Tbr Val Leo Thr Asn Leo Leo Asp Ser Asp 
35 40 45 

Asp Val Arg His Met Leo Asa Ala Leo Ser Ala Leo Oly lie Asa Tyr 
5 0 5 5 60 

Tbr Lea Ser Ala Asp Arg Thr Arg Cys Asp lie Thr Oly Asa Oly Oly 
65 70 75 80 

Pro Leo Arg Ala Ser Oly Thr Leo Ola Leo Phe Leo Oly Asa Ala Oly 
8 3 90 95 

Thr Ala Met Arg Pro Leo Ala Ala Ala Leo Cys Leo Oly Ola Asa Glo 
10 0 10 5 110 

lie Val Leo Thr Oly Olo Pro Arg Met Lys Olo Arg Pro lie Gly His 
115 12 0 12 5 

Leo Val Asp Ser Leo Arg Gla Gly Gly Ala Asa lie Asp Tyr Leo Glo 
130 135 140 

Ola Glo Asa Tyr Pro Pro Leo Arg Leo Arg Gly Oly Phe lie Gly Gly 
145 150 135 160 

Asp lie Gin Val Asp Gly Ser Val Ser Ser Gla Phe Leo Thr Ala Leo 
165 170 175 

Leo Met Tbr Ala Pro Leo Ala Pro Glo Asp Thr lie lie Arg Val Lys 
1 8 0 18 5 190 

Gly Glo Leo Val Ser Lys Pro Tyr lie Asp lie Thr Leo Asa Leo Met 
195 200 205 

Lys Thr Phe Gly Val Glo lie Ala Asn His His Tyr Ola Gla Phe Val 
2 10 2 15 2 2 0 

Val Lys Gly Gly Ola Gla Tyr His Ser Pro Gly Arg Tyr Leo Val Olo 

225 230 235 2 4.0 

Gly Asp Ala Ser Ser Ala Ser Tyr Phe Leo Ala Ala Gly Oly lie Lys 
245 250 25 5 

Oly Gly Thr Val Lys Val Thr Oly lie Oly Oly Lys Ser Met Oltt Oly 
260 265 270 

Asp lie Arg Phe Ala Asp Val Leo His Lys Met Oly Ala Thr lie Thr 
275 280 285 

Trp Gly Asp Asp Phe lie Ala Cys Thr Arg Oly Olo Leo His Ala lie 
290 295 300 

Asp Met Asp Met Asa His lie Pro Asp Ala Ala Met Thr lie Ala Thr 
305 3 10 315 320 

Thr Ala Leo Phe Ala Lys Oly Thr Thr Thr Leo Arg Asa lie Tyr Asa 

325 330 333 

Trp Arg Val Lys Glo Tbr Asp Arg Leo Phe Ala Met Ala Thr Glo Leo 
340 345 350 

Arg Lys Val Gly Ala Glo Val Glo Olo Oly His Asp Tyr lie Arg lie 
355 360 365 

Thr Pro Pro Ala Lys Leo Ola His Ala Asp lie Gly Thr Tyr Asa Asp 
370 375 380 
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His Ar g Met Ala 
3 83 

Va 1 Thi lie Leo 



Pbe Ola Oln Leo 
42 0 



Me t Cy i Phe Set 
3 90 

Asp Pro Lye Cy i 
405 

Ala Ar g Met Ser 



Leu Va 1 Ala Leo 

3 9 3 

Thr Ala Lya Thr 
4 10 

Thr Pro Ala 
425 



Ser Asp Thr Pro 
40 0 

Phe Pro Asp Tyr 
4 1 5 



( 2 ) INFORMATION FOR SEQ ID N046: 

( i ) SEQUENCE CHARACTERISTICS: 

( A ) LENGTH: 1894 base pain 
( B ) TYPE: racloc add 
( C ) STRANDEDNESS: doable 
( D ) TOPOLOGY: fines 

( i i ) MOLECULE TYPE: DNA (genomic) 

( i x ) FEATURE: 

( A ) NAME/KEY: CDS 
(B ) LOCATION: 275-1618 

( z i ) SEQUENCE DESCRIPTION: SBQ ID NCh66: 

ACOOOCTOTA AC GOT AGT AO OGG TCCCGAG CACAAAAGCG OTGCCGOCAA GCAOAACTAA 60 

TTTCCATOGO GAAT A ATGGT ATTTCATTGG TTTGGCCTCT GOTCTOOCAA TGOTTGCTAO 120 

GCOATCOCCT OTTGAAATTA ACAAACTGTC GCCCTTCCAC TGACCATGGT AACGATGTTT 180 

TTTACTTCCT TOACTAACCG AGGAAAATTT GGCGGGGGGC AG A A ATGC C A ATACAATTTA 240 

GCTTGGTCTT CCCTGCCCCT AATTTGTCCC CTCC ATO OCC TTG CTT TCC CTC 292 

Met Ala Leu Leo Ser Leo 
1 5 

AAC A AT CAT CAA TCC CAT CAA CGC TTA ACT GTT A A T CCC CCT OCC CAA 3 40 

Asn Asn His Gin Ser His Gin Arg Leo Thr Val Am Pro Pro Ala Oln 
10 15 2 0 

GOO OTC OCT TTG ACT GGC CGC CTA AGG GTG CCG GOO GAT AAA TCC ATT 3 88 

Gly Val Ala Leo Thr Gly Arg Leo Arg Val Pro Gly Asp Lys Ser lie 
25 30 35 

TCC CAT CGO OCC TTG ATG TTG OGG GCG ATC OCC ACC GOG GAA ACC ATT 436 
Ser His Arg Ala Lea Met Lea Oly Ala lie Ala Thr Gly Olu Thr lie 
40 45 5 0 

ATC GAA GOO CTA CTG TTG GOG GAA GAT CCC COT AGT ACQ OCC CAT TGC 484 
lie Glo Gly Lea Leu Leo Gly Glo Asp Pro Arg Ser Thr Ala His Cys 
55 60 65 70 

TTT CGO OCC ATG GGA OCA GAA ATC AOC GAA CTA A AT TCA GAA AAA ATC 5 32 

Phe Arg Ala Met Oly Ala Glo lie Ser Glu Leu Asa Ser Glo Lys lie 
7 3 80 8 5 

ATC GTT CAG GGT COG GOT CTG GGA CAG TTG CAG GAA CCC AGT ACC GTT 5 80 

lie Val Oln Gly Arg Oly Leo Gly Gin Loo Gin Glo Pro Ser Thr Val 
90 95 100 

TTG GAT OCO OGG AAC TCT GGC ACC ACC ATG CGC TTA ATG TTG GGC TTG 628 
Leo Asp Ala Gly Asn Ser Gly Thr Thr Met Arg Leo Met Leo Gly Leo 
10 5 110 115 

CTA OCC GGG CAA AAA GAT TOT TTA TTC ACC GTC ACC GGC OAT GAT TCC 676 
Leo Ala Oly Gin Lys Asp Cys Lea Phe Thr Val Thr Oly Asp Asp Ser 
120 12 5 13 0 

CTC CGT CAC CGC CCC ATG TCC COG GTA ATT CAA CCC TTG CAA CAA ATG 724 
Leo Arg His Arg Pro Met Ser Arg Val lie Gin Pro Leo Gin Gin Met 
135 140 145 150 

GOO GCA AAA ATT T GO OCC CGG AGT AAC GGC A AO TTT OCG CCG CTG OCA 772 
Gly Ala Lys lie Trp Ala Arg Ser Asn Oly Lys Phe Ala Pro Leo Ala 
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OTC CAG GOT 
Val Gin Gly 



TCA GCC CAO 

Ser Ala Gin 
1 8 3 

GOG G AC ACC 

Gly Asp Thr 
2 00 

CGC AT G TTG 

Arg Met Leo 
2 1 5 

CAT AGC OTC 

His S c r Val 



GTG GTO CCA 
Val Val Pro 



TCC ATT TTG 

Ser lie Leu 
265 

CCC ACC AGO 

Pro Th r Arg 
2 80 

ATT ACC CCG 

lie Tor Pro 
295 

CTG CGG GTT 

Leo Arg Val 



ATT ATT CCC 
lie lie Pro 



GCC TTT GCA 

Ala Phe Ala 
3 45 

GTT AAA G A A 

Val Ly s Olu 
3 60 

ATG GOG GCC 

Me t Gly Ala 

3 75 

OOA AGC CCG 

Gly Sor Pro 



ATT GCC ATG 
lie Ala Met 



ATT ATT AAC 
lie lie Aid 
42 5 



AGC CAA TTA 

Ser Gin Leu 
I 7 0 

OTA AAO TCC 

Val Ly b Ser 



ACQ GTT ACA 
Tbr Val Thr 



CAG GCC TTT 

Oln Ala Phe 
2 2 0 

ACT OTC CAT 

Tbr Val His 
23 5 

GGG GAC AT C 

Gly Asp lie 

25 0 

CCT OOA TCA 

Pro Oly Ser 



ACA GGG GTO 
Thr Gly Val 



GAG A A T OA A 

Olo Asn Oln 
3 0 0 

AGO OCA AOC 

Arg Ala Ser 
3 1 5 

COA CTG ATT 

Arg Len lie 
3 3 0 

OAO OGC ACT 

Gin Oly Th r 



AOC GAT COC 
Ser Asp Arg 



AAA OTC ACC 

Lys Val Thr 
3 8 0 

TTA CAA GOO 

Len Gin Gly 
3 9 5 

OCG TTG GCO 

Ala Leu Ala 
4 1 0 

CGG OCG OA A 

Arg Ala Gin 



AAA CCG AT C 

Lys Pro lie 
1 7 5 

TOC CTO TTG 

Cys Leo Len 
1 9 0 

OAA CCA GCT 

Gin Pro Ala 
2 05 

OOA OCC AAA 

Gly Ala Lys 



OOC CCG GCC 
Oly Pro Ala 



AGC TCG GCO 

Ser Ser Ala 

2 5 5 

OAA TTG TTG 

Gin Len Leu 
2 70 

TTG OAA OTG 

Leu Oln Val 
285 

CGA T TO GTA 

Arg Len Val 



CAT CTC CAO 
His Len Oln 



GAT OAA ATT 

Asp Gin lie 

3 3 5 

ACC CGC ATT 

Thr Arg lie 
3 50 

CTG OCG OCC 

Leu Ala Ala 
3 6 5 

OAA TTT OAT 

Olu Phe Asp 



OCC GAG GTO 
Ala Oln Val 



AT C GCC GCT 
lie Ala Ala 
4 1 5 

GCO GCC GCC 
Ala Ala Ala 
4 3 0 



CAT TAC CAT 
His Ty r His 



CTA OCO OOO 
Len Ala Oly 



CTA TCC COO 

Len Ser Arg 
2 10 

TTA ACC ATT 

Leu Thr lie 
2 25 

CAT TTA ACQ 

His Leu Thr 
2 40 

OCC TTT TOG 

Ala Phe Trp 



GTO OAA AAT 
Val Oln Asa 



TTG GCC CAG 

Len Ala Oln 
2 9 0 

ACO GGG OAA 

Thr Oly Olu 
3 05 

GOT TOC ACC 

Oly Cys Thr 
3 2 0 

CCC ATT TTG 

Pro lie Len 



OAA CAT GCC 
Gin Asp Ala 



ATT OCT TCO 

lie Ala Ser 
3 7 0 

GAT OGC CTG 

Asp Oly Len 
3 85 

OAT AOC TTG 

Asp Ser Len 
40 0 

TTA GOT AOT 

Len Gly Ser 



ATT TCC TAT 
lie Ser Tyr 



TCC CCC ATT 

Ser Pro lie 
1 8 0 

TTA ACC ACC 

Len Thr Thr 
1 9 5 

OAT CAT AOC 

Asp His Ser 



GAT CCA OTA 
Asp Pro Val 



OOO CAA COO 

Gly Gin Arg 
245 

TTA OTO GCO 

Leu Val Ala 
2 6 0 

OTA OOC ATT 

Val Oly lie 
2 7 5 

ATO GGG GCO 

Me t Oly Ala 



CCG GTA GCA 
Pro Val Ala 



TTC OOC OOC 

Phe Oly Oly 
3 2 5 

OCA OTG OCO 

Ala Val Ala 
3 40 

OCA OAA CTG 

Ala Oln Len 

3 55 

OAO TTG OOC 

Oln Leu Gly 



OAA ATT CAA 
Olu lie Gin 



ACG GAT CAT 

Thr Asp His 
40 5 

OGO GOG CAA 

Gly Oly Oln 
42 0 

CCA G A A TTT 

Pro Gin Phe 
43 5 



GCT 8 2 0 

Ala 



OAO 86 8 

Olu 



OAA 916 
Ol u 



ACC 964 
Th r 

23 0 

GTG 10 12 

Va 1 



GCA 1060 
Ala 



AAC 1108 
As a 



GAC 1156 
Asp 



GAT 12 04 

Asp 

3 1 0 

OAA 125 2 

O 1 n 



GCO 13 00 

Ala 



AGO 13 48 

Arg 



AAA 13 96 

Lys 



OOO 14 44 

Oly 

3 9 0 

CGC 1492 
Arg 



ACA 1540 
Thr 



TTT 15 8 8 

Phe 



GOC ACG CTA GOG CAA GTT GCC CAA OOA TAAAOTTAGA AAAACTCCTG 163 5 

Gly Thr Len Gly Oln Val Ala Gin Oly 
440 445 



GO COO TT TOT AAATGTTTTA CCAAOGTAGT TTGOGGTAAA GGCCCCAOCA AGTGCTOCCA 1695 
GGOTAATTTA TCCOC AATTG ACCAATCGGC ATOGACCOTA TCOTTCAAAC TGGGTAATTC 1755 
TCCCTTT AAT TCCTTAAAAG CTCOCTT AAA ACTOCCCAAC OTATCTCCOT AAT OGC GAG T 1815 



GAOTAGAAGT AATGOGGCCA AACGGCGATC GCCACGGGAA ATT AAAGCC T OCATCACTOA 1875 
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CCACTTATAA CTTTCOOOA 



18 94 



( 2 ) INFORMATION FOR SEQ ID N067: 

( i ) SEQUENCE CHARACTERISTICS: 

{ A ) LENGTH: 447 amino adds 
( B ) TYPE: amino add 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: protein 

< x i ) SEQUENCE DESCRIPTION: SEQ ID NCk67: 

Met Ala Leo Leo Sci Leo Asn Am His 



G 1 n 
1 0 



Sex His G 1 o 



Arg Leo Thr 
1 5 



Val Asn Pro Pro Ala Gin Gly 
2 0 



Val 



Al a 

2 5 



Leo Thr Gly Arg 



Leo Arg Val 

3 0 



Pro Gly Asp Lys Ser lie Set 
3 5 



50 



Gin Glo Pro Ser Thr Val Le 
1 0 0 



Hi s 
40 



A 1 a 



Ala Thr Gly Glo Thr lie lie Glo 



5 5 



Arg Sor Thr Ala His Cys Phe Arg 



Leo Asn Ser Glo Lys lie lie 
85 



o Asp 



Ar g 
Gly 
Ala Met 



Leo 



Val Gin 



Al a 
1 0 5 



Gly 
90 

Gly 



Leu Met Leo 
45 

Leo Leo Gly 
60 

Gly Ala Glo 
75 

Arg Gly Leo 

Asn Ser Gly 



Gly Ala lie 
Glo Asp Pro 



lie Ser Glo 
8 0 

Gly Gin Leu 
95 

Thr Thr Me t 
1 1 0 



Arg Leo Met Leo Gly Leo Leo 
115 



A 1 a 
1 2 0 



Gly Gin 



Val Thr Gly Asp Asp Ser Leo Arg His Arg 



1 3 0 



1 3 5 



Gin Pro Leo Gin Gin Met Gly Ala 



145 



1 5 0 



Lys Phe Ala Pro Leo Ala Val 
165 

Tyr His Ser Pro lie Ala Ser 
18 0 

Ala Gly Leu Thr Thr Glo Gly 
195 



Lys 
Gin Ol y 



Al a 



Asp 

2 0 0 



2 1 0 



2 1 5 



2 2 5 



2 3 0 



Leo Thr Gly Gin Arg Val Val 
245 



Phe Trp Leo Val Ala Ala So 
2 60 



r lie 



Glo Asn Val Gly lie Asa Pro 

2 7 5 



Thr 
2 8 0 



2 9 0 



2 95 



3 05 



3 1 0 



Cys Thr Phe Gly Gly Olo lie 

3 25 

lie Leo Ala Val Ala Ala Ala 
3 40 



Arg 
I 1 e 
Phe 



I 1 e 



Ser 
1 7 0 



Ol n 
18 5 



Va 1 



Thr Thr 



Ser Arg Asp His Ser Olu Arg Met Leo Gin 



Thr lie Asp Pro Val Thr His Ser Val Thr 



Val Pro 



Gly 
2 5 0 



Leo Pro 
2 6 5 

Arg Thr 



Ala Gin Met Gly Ala Asp lie Thr Pro Glo 



Gly Glo Pro Val Ala Asp Leo Arg Val Arg 



Lys Asp Cys 
125 

Pro Met Ser 
140 

Trp Ala Arg 
155 

Gin Leo Lys 

Lys Ser Cys 

Val Thr Glo 
2 05 

Ala Phe Gly 

2 2 0 

Val His Gly 

235 

Asp tie Ser 

Gly Ser Glo 



Gly Val Leo 
2 8 5 

Asn Gin Arg 
3 0 0 

Ala Ser His 

3 I 5 



Pro 



Al a 
3 45 



Ar g 

3 3 0 



Leo lie Asp 
Glo Gly Thr Thr 



Leu Phe Thr 



Ar g Va 1 II* 



Sez Asn Gly 
16 0 

Pro lie His 
175 

Leo Leo Leo 
19 0 

Pro Ala Loo 



Ala Lys Leo 



Pro Ala His 
2 4 0 



Ser Ala Ala 

2 5 5 



Leu Leo Val 
2 70 



Olu Val Leo 



Leo Val Thr 



Leo Gin Gly 
3 2 0 

Olo lie Pro 
3 3 5 

Arg lie Gin 
3 50 



5,633,435 

153 154 

-continued 



Asp Ala Ala OU Leo Arg Val Lya Olu Set Asp Arg Lou Ala Ala lie 
355 360 365 

Ala Ser Gin Leo Oly Lyi Met Giy Ala Lys Val Thr OU Phe Asp Asp 
370 375 380 

Oly Leu Olu lie Oln Oly Gly Ser Pro Leo Gin Oly Ala Olo Val Asp 
385 390 395 400 

Ser Leo Thr Asp His Arg lie Ala Met Ala Leo Ala lie Ala Ala Leo 
405 4 10 415 

Gly Ser Gly Oly Ola Thr lie lie Asa Arg Ala Olo Ala Ala Ala lie 
420 425 430 

Ser Tyr Pro Olo Phe Phe Oly Thr Leo Oly Oln Val Ala Oln Gly 
435 440 445 

( 2 ) INFORMATION FOR. SEQ ID NCW58: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 1479 base pairs 
( B )TYPE: nockacacid 
( C ) STRANDEDNESS: double 
(D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: DNA (genome) 

( i z ) FEATURE: 

( A) NAME/KEY: CDS 
( B ) LOCATION: 107-1438 

( z i ) SEQUENCE DESCRIPTION: SEQ ID NO68: 

TTTAAAAACA ATOAGTTAAA AAATTATTTT TCTOOCACAC GCOCTTTTTT TOCATTTTTT 60 

CTCCCATTTT TCCGGCACAA TAACOTTOOT TTTATAAAAG OAAATG ATG ATO ACO 115 

Met Me t Thr 
1 

AAT ATA TOO CAC ACC OCG CCC GTC TCT GCO CTT TCC GOC GAA ATA ACO 163 
Asn lie Trp His Thr Ala Pro Val Ser Ala Leo Ser Oly Olo lie Thr 
5 10 15 

ATA TOC OOC GAT AAA TCA ATO TCO CAT COC GCC TTA TTA TTA OCA OCG 211 
lie Cy s Gly Asp Lys Ser Met Ser His Arg Ala Leo Leo Leo Ala Ala 
20 25 30 35 

TTA GCA GAA OGA CAA ACO GAA ATC COC OOC TTT TTA OCO TOC OCO OAT 259 
Leo Ala Olo Gly Ola Thr Olo lie Arg Oly Phe Leo Ala Cys Ala Asp 
40 45 50 

TGT TTO GCG ACO COG CAA GCA TTG COC OCA TTA OOC GTT OAT ATT CAA 307 
Cys Leo Ala Thr Arg Oln Ala Leo Arg Ala Leo Oly Val Asp lie Oln 
5 5 6 0 6 5 

AGA OA A AAA GAA ATA OTO ACO ATT CGC GOT GTG OGA TTT C TO GOT TTO 355 
Arg Olo Lys Glo lie Val Thr lie Arg Gly Val Oly Phe Leu Gly Leo 
70 7 5 80 

CAG CCO CCG AAA OCA CCO TTA AAT ATG CAA AAC AOT GOC ACT AOC ATO 403 
Oln Pro Pro Lys Ala Pro Leo Asn Met Oln Asn Ser Oly Thr Ser Met 
85 9 0 95 

CGT TTA TTG OCA OGA ATT TTG OCA GCO CAG COC TTT GAG AGC OTO TTA 45 1 

Arg Leo Leo Ala Oly lie Leu Ala Ala Ola Arg Phe Glo Ser Val Leo 
100 105 110 115 

TGC OOC OAT OA A TCA TTA GAA AAA CGT CCO ATO CAG COC ATT ATT ACO 499 
Cys Oly Asp Olo Ser Leo Glo Lys Arg Pro Met Gin Arg lie lie Thr 
120 125 13 0 

CCG CTT OTO CAA ATO GGG OCA AAA ATT OTC AOT CAC AOC AAT TTT ACO 547 
Pro Loo Val Gin Met Gly Ala Lys lie Val Ser His Ser Asn Phe Thr 
13 5 140 145 

GCO CCG TTA CAT ATT TCA OOA COC CCO CTO ACC OOC ATT GAT TAC GCG 5 95 

Ala Pro Leo His lie Ser Gly Arg Pro Leo Thr Oly lie Asp Tyr Ala 
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i s o 



1 55 



1 60 



TTA CCO CTT CCC AOC OCG 
Leo Pro Lea Pro Ser Ala 
165 



CAA TTA AAA AOT TOC CTT ATT TTO OCA OOA 
Gin Leo Lys Ser Cys Leo lie Leo Ala Gly 
170 175 



643 



TTA TTG OCT G AC 
Leo Leu Ala Aip 



GOT ACC 
Ol y Th r 



ACO 
Th r 



COG 
Ar g 



CTG CAT ACT TOC OOC AT C AGT COC 



L e n Hi* 



Th r Cy b Gly 
1 90 



I I e 



Ser A r s 
1 95 



69 1 



G AC CAC ACG GAA 
Asp Bis Thr Ol s 



COC ATO 
Ar g Me t 
2 0 0 



TTO 
Lea 



CCO 
Pro 



CTT TTT GOT OOC OCA CTT OAO AT C 
Leo Pbe Oly Gly Ala Leo Glo lie 



2 05 



2 10 



AAG AAA GAG CAA ATA ATC OTC ACC GOT GOA CAA AAA TTO CAC GOT TOC 
Lys Lys OIo Gin lie lie Val Thr Gly Gly Gin Lys Leo His Gly Cy » 
215 220 225 



7 87 



OTO 
Va 1 



CTT OAT 
Leo Asp 

2 3 0 



ATT 
I 1 e 



OTC 
Val 



GGC 
Oly 



GAT 
Asp 



TTG 
Leo 
2 3 5 



T CO GCG OCG GCO TTT TTT ATO GTT 
Ser Ala Ala Ala Pbe Phe Met Val 



83 5 



GCG 
A 1 a 



GCT TTG 
Ala Loo 
245 



ATT 
I 1 e 



GCO 
A 1 a 



CCG 
Pro 



COC 
Ar g 
2 5 0 



GCO 
Ala 



GAA GTC GTT ATT COT A AT OTC OOC 
Glo Val Val lie Arg Asn Val Oly 



8 8 3 



ATT 

1 1 e 

2 6 0 



AAT CCG 
A a n Pro 



ACO 
Thr 



COG 
Ar 8 



OCG 
Al a 
2 6 5 



OCA 
Al a 



ATC 
I 1 e 



ATT ACT TTG TTG CAA AAA ATO GGC 
lie Thr Leo Leo Gin Lys Met Gly 



2 70 



275 



93 1 



GOA 
Oly 



COO ATT GAA 
Ar g lie Glo 



TTO CAT CAT CAO COC TTT TOO OGC GCC GAA CCG GTG 
Leo His His Gin Arg Phe Trp Gly Ala Gin Pro Val 
280 285 290 



979 



OCA 
Ala 



GAT ATT GTT 
Ai p lie Val 
2 9 5 



GTT 
Va 1 



TAT 
Ty r 



CAT 
Hi s 



TCA 
S o r 



AAA TTG CGC OGC ATT ACG OTO OCG 
1 e Thr Val Ala 



Lys Leo Arg Oly 
3 0 0 



305 



10 2 7 



CCO 
Pro 



GAA TGG ATT 
Glo Trp lie 



GCC AAC GCO ATT OAT GAA TTO CCG ATT TTT TTT ATT 
Ala Asn Ala lie Asp Gin Leo Pro lie Phe Phe lie 
3 15 3 2 0 



GCO 
A 1 a 



OCA OCT TOC 
Ala Ala Cys 



OCG GAA GOG ACO ACT TTT OTG OGC AAT TTG TCA GAA 
Ala Glo Oly Thr Thr Phe Val Oly Asn Leo Ser Glo 

3 3 0 3 3 5 



112 3 



TTO 
Leo 
3 40 

CAA 
O 1 n 



COT GTG AAA 

Arg Val Lys 

ACT TTO OOC 

Thr Leo Gly 



GAA 
G 1 n 



OTO 
Va 1 

3 6 0 



TCG 
Ser 
3 45 

OCG 
Al a 



GAT 
Asp 



TOC 
Cy s 



COT 
Arg 



OAC 
Asp 



TTA GCO OCG ATG GCG CAA AAT TTA 
Leu Ala Ala Met Ala Gin Asn Leo 



3 50 



3 5 5 



GTT GOC GCC GAT TTT ATT CAT ATA 
Val Gly Ala Asp Phe lie His lie 
365 37 0 



12 19 



TAT GOA AG A AGC GAT CGO CAA TTT TTA CCG GCO CGG GTG AAC AGT TTT 
Tyr Gly Arg Ser Asp Arg Gin Phe Leo Pro Ala Arg Val Asn Ser Phe 
375 380 385 



OOC 
G 1 y 



OAT 
Asp 



CAT CGG 
His Arg 
3 9 0 



ATT 
I 1 e 



GCG 
A 1 a 



ATO 
Me t 



AOT 
Ser 
3 9 5 



TTG GCO GTG OCA GOT GTG CGC GCO 
Leo Ala Val Ala Oly Val Arg Ala 



13 15 



OCA OGT GAA TTA TTO ATT GAT OAC GGC GCO GTG GCO OCG GTT TCT ATG 
Ala Gly Glo Leo Leo lie Asp Asp Oly Ala Vol Ala Ala Val Ser Met 
405 410 415 



13 6 3 



CCG 
Pro 
4 2 0 



CAA 
G 1 n 



TTT CGC 
Phe Arg 



GAT TTT 
Asp Phe 
425 



GCC 
A 1 a 



GCC 
A 1 a 



OCA ATT OGT ATG AAT OTA OOA GAA 
Ala lie Oly Met Asn Val Oly Olo 



4 3 0 



4 3 5 



14 11 



AAA 
Lys 



OAT OCG AAA 
Asp Ala Lys 



AAT TOT 
Asn Cys 
440 



CAC 
HI s 



OAT 
Asp 



TGATGGTCCT AOCGGTGTTO GAAAAOGCAC 



1465 



GO TGGCG CAA OCTT 



1479 



( 2 ) INFORMATION FOR SEQ ID NOrf9: 



( i ) SEQUENCE CHARACTERISTICS: 
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Me t 
1 



( A ) LENGTH: 443 amino adds 
( B ) TYPE: amino acid 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: protean 

( x i ) SEQUENCE DESCRIPTION: SEQ ID NOs69: 

Met Thr Am lie Trp Hi* 

5 



Thr Ala 



Oln lie Thr lie Cp Gly Asp 
2 0 

Leu Ala Ala Lea Ala Oln Oly 

3 5 



O 1 a 
40 



S e r 
2 5 



5 0 



Asp 

65 



5 5 



lie Oln Arg Oln Ly » Olu 
70 

Oly Leu Oln Pre Pre Lya 
8 5 

Ser Met Arg Lea Len Ala 
10 0 



Ala Pro 



Oly 



Pre 
1 0 



Me t 



Thr Oln 



Cys Ala Aip Cy * Lea Ala Thr Arg Oln Ala 



lie Va I Thr 



Lea 
9 0 



I 1 e 
1 0 5 



Va 1 Ser Ala Leo 



Ser Bis Arg Ala 
3 0 

lie Arg Oly Phe 
45 



Ser 
1 5 



Lea Ala Ala Oln 



Arg 
110 



Oly 



Lea Lea 



Leo Ala 



Leo Arg Ala Leo Oly 
60 

lie Arg Gly Val Oly 
75 



Val 



Phe 
8 0 



A b n Met Oln Asn Ser Oly 



c r 
9 5 



Phe Oln 



Val Leo Cys Oly Asp Ola 
1 1 5 



As n 

1 45 

Asp 



Lea 



I 1 e 



Len 



Hi s 
2 25 



Phe 



O 1 a 
2 10 



S o r 
120 



lie lie Thr Pro Lea Val Oln Met 



Len Oln L y s Arg 
Gly Ala 



130 13 5 

Phe Thr Ala Pro Loo His 
15 0 

Tyr Ala Leo Pro Leo Pro 
1 65 

Ala Oly Leo Lea Ala Asp 
1 8 0 

Ser Arg Asp His Thr Ola 
195 



lie Ser Oly 



Ser Ala 



Lys Lys Olo Gin 
2 1 5 



Oly Cys Val Len Asp lie 
23 0 

Met Val Ala Ala Len lie 
2 4 5 



Asn Val Oly lie Asn Pro Thr 
2 6 0 

Lys Met Oly Oly Arg lie GIo 
275 



0 1 n 

1 7 0 



Oly 



Arg 
200 



Thr 
185 



Me t Leo Pro Leo 



lie lie Val 



Val Oly Asp 



Thr Oly 
2 2 0 

Leo Ser 
2 3 5 



Ala Pro 



Arg 

2 5 0 



Ar g 



Lea 
2 8 0 



Al a 
265 



Al a 



His His 



Gin Pro Val Ala Asp lie Val Val Tyr His 



2 90 



2 95 



Thr 
3 0 5 



Phe 



Leo 



G 1 n 



Val Ala Pro Olu Trp lie 
3 1 0 

Phe lie Ala Ala Ala Cys 

3 2 5 

Ser Glo Leo Arg Val Lya 
3 4 0 

Aid Len Oln Thr Leo Gly 

3 5 5 



Ala Aa n Ala 



Ala Glo 



O 1 o 



Va 1 
3 6 0 



Ser 
3 45 



Oly 

3 3 0 

Asp 
Cys 



lie His lie Tyr Gly Arg Ser Asp Arg Gin 



Pro 
1 25 



Me t Oln Arg 



Lys lie 
1 40 

Arg Pro 
15 5 



Val Ser 



Leo Lys 
Thr Arg Leo 



Leo 



Ser 



Hi s 



Thr 
Cys 



Phe 
205 



Ala Olo Val Val 



Hi s 
Oly 



Leo 
175 



Thr 
1 9 0 



Cys 
Oly 

Oly Oln Lya 



Oly 



Ala Ala Ala 



1 1 e 

2 5 5 



lie lie Thr Len 
2 70 

Gin Arg Phe Trp 
2 8 5 

Ser Lys Len Arg 
3 0 0 

lie Asp Olo Len 

3 1 5 

Thr Thr Phe Val 



Ser 



I 1 e 
1 60 



I 1 e 



Oly 



Al a 



Leo 



Phe 
2 40 

Arg 



Leo Oln 



Oly 
Oly 
Pro 



Oly 

3 3 5 



Al a 



1 1 e 



I 1 e 
3 20 



3 70 



3 7 5 



Arg Leo Ala Ala Met Ala 

3 5 0 

Asp Val Oly Ala Asp Phe 

3 6 5 

Phe Leo Pro Ala Arg Val 
3 8 0 



